Introduction

Overview and Motivation

According to the World Wildlife Fund forests cover more than 30% of the Earth’s land surface and can provide food, medicine and fuel for more than a billion people. They are a resource, but they are also large, undeveloped land that can be converted for purposes such as agriculture and grazing. According to National Geographic in North America, about half the forests in the eastern part of the continent were cut down for timber and farming between the 1600s and late 1800s, today, most deforestation is happening in the tropics. Negative effects of deforestation are the loss of animal and plant species due to their loss of habitat, the increase of greenhouse gases and lower water levels.Trees are bringing humidity to the land, by decreasing the trees numbers also the humidity in the land of the region decreases this leads to a higher risk of forest fire. Because the negative effects of deforestation are substantive we decided to focus in our project on the areas that are affected from deforestation the cause of deforestation.

Research questions

Therefore we developed the following research questions: Which countries are affected by the deforestation? How does this differ by year? What are the causes of deforestation? Which of these causes have the highest effect?

Data

Index for the different variables

In our analysis we’ll be using many variables. Some of them are directly taken from the original data sets and some of them are derived from existing variables. The aim of this “Data” section is to tidy one by one our data set and extract the variables we are interested in. Here we’ll state an index to easily find where we treated each variable of our final “merged” data set:

  1. Agricultural land

  2. Change in agricultural Land

  3. Forest area

  4. Deforestation

  5. Agricultural exports

  6. Total land area

  7. Production / Extraction of sources of energy

  8. Population growth

  9. Openness to trade

  10. Soybeans production

  11. Merged Data Set


1) Agricultural Land

The “AgriLandData” data set shows the total area of agricultural land in square km per country/region for the years between 1960 and 2020. In this project we will limit our analysis the years starting from 2001. This data set is taken across 266 countries.


Country Year AgriLand
Aruba 1980 20
Aruba 1981 20
Aruba 1982 20
Aruba 1983 20
Aruba 1984 20
Aruba 1985 20
Aruba 1986 20


2) Change in Agricultural Land

To know how the area of agricultural land changed over time we created two new columns that contain information about the change to the year before, one in square km and one in percentage.
Country Year AgriLand Change agricultural land Change agricultural land %
Africa Eastern and Southern 1981 5391340 400 0.007
Africa Eastern and Southern 1982 5391740 4280 0.079
Africa Eastern and Southern 1983 5396020 -40 -0.001
Africa Eastern and Southern 1984 5395980 21580 0.400
Africa Eastern and Southern 1985 5417560 7640 0.141
Africa Eastern and Southern 1986 5425200 3470 0.064
Africa Eastern and Southern 1987 5428670 5780 0.106
Africa Eastern and Southern 1988 5434450 11010 0.203


3) Forest Area

The *ForestArea" data set shows the total forest area in square km per country/region for the years between 1960 and 2020. In this project we are just interested in the years starting from 2001.

Country Year ForestArea
Africa Eastern and Southern 1990 4988232
Africa Eastern and Southern 1991 4968316
Africa Eastern and Southern 1992 4947641
Africa Eastern and Southern 1993 4911135
Africa Eastern and Southern 1994 4890839
Africa Eastern and Southern 1995 4870543
Africa Eastern and Southern 1996 4850247
Africa Eastern and Southern 1997 4829951


4) Deforestation

Using the “ForestAreaData” data set, we created two new columns that contain information about the change to the year before, one in square km and one in percentage, namely the “Deforestation” variable and the “Deforestation %”


Country Year ForestArea Deforestation Deforestation %
Africa Eastern and Southern 1990 4988232 -19916 -0.004
Africa Eastern and Southern 1991 4968316 -20675 -0.004
Africa Eastern and Southern 1992 4947641 -36506 -0.007
Africa Eastern and Southern 1993 4911135 -20296 -0.004
Africa Eastern and Southern 1994 4890839 -20296 -0.004
Africa Eastern and Southern 1995 4870543 -20296 -0.004
Africa Eastern and Southern 1996 4850247 -20296 -0.004
Africa Eastern and Southern 1997 4829951 -20296 -0.004


5) Agricultural exports

The data set we took includes information about 149 indicators for 237 countries from 1960 to 2020. To get an impression about the data we show the first 7 rows and 7 columns from the data set.


Country Indicator Name 1960 1961 1962 1963 1964
Aruba Merchandise exports by the reporting economy, residual (% of total merchandise exports) NA NA NA NA NA
Aruba Merchandise exports to low- and middle-income economies in Sub-Saharan Africa (% of total merchandise exports) NA NA NA NA NA
Aruba Merchandise exports to low- and middle-income economies in South Asia (% of total merchandise exports) NA NA NA NA NA
Aruba Merchandise exports to low- and middle-income economies in Middle East & North Africa (% of total merchandise exports) NA NA NA NA NA
Aruba Merchandise exports to low- and middle-income economies in Latin America & the Caribbean (% of total merchandise exports) NA NA NA NA NA
Aruba Merchandise exports to low- and middle-income economies in Europe & Central Asia (% of total merchandise exports) NA NA NA NA NA
Aruba Merchandise exports to low- and middle-income economies in East Asia & Pacific (% of total merchandise exports) NA NA NA NA NA
Aruba Merchandise exports to low- and middle-income economies outside region (% of total merchandise exports) NA NA NA NA NA

Before selecting the variables we need we changed the structure of the data set such that it has the year and the different indicator names as column names and we also just selected data from 2000 and above.

Country Year Travel services (% of commercial service exports) Transport services (% of commercial service exports) High-technology exports (% of manufactured exports) High-technology exports (current US$)
Aruba 1980 NA NA NA NA
Aruba 1981 NA NA NA NA
Aruba 1982 NA NA NA NA
Aruba 1983 NA NA NA NA
Aruba 1984 NA NA NA NA
Aruba 1985 NA NA NA NA
Aruba 1986 77.3 2.98 NA NA

From all these variables just two are of interest for our project. Therefore we selected just these indicators and created a new data set for each of them. The result was:

Country Year Merchandise exports (current US$)
Albania 1992 70000000
Albania 1993 122000000
Albania 1994 140000000
Albania 1995 202000000
Albania 1996 207000000
Albania 1997 139000000
Albania 1998 205000000
Albania 1999 351000000


A data set that shows the percentage of food exports from total merchandise exports in US$ per country and year.

Country Year Food exports (% of merchandise exports)
Aruba 2000 49.96
Aruba 2001 45.08
Aruba 2002 44.09
Aruba 2003 46.87
Aruba 2004 35.86
Aruba 2005 1.18
Aruba 2006 34.38

And now we’ll also take a data set showing the percentage agricultural raw material exports from total merchandise exports in US$ per country and year.

Country Year Agricultural raw materials exports (% of merchandise exports)
Aruba 2000 0.737
Aruba 2001 0.549
Aruba 2002 0.993
Aruba 2003 1.172
Aruba 2004 1.249
Aruba 2005 0.317
Aruba 2006 1.096


6) Total Land Area

This data set shows how much squared kilometer every country has. This will make it possible identify how many percent of this area is forest and how many percent of the land is agricultural land.
Country Year Land Area in sq km
Aruba 1980 180
Aruba 1981 180
Aruba 1982 180
Aruba 1983 180
Aruba 1984 180
Aruba 1985 180
Aruba 1986 180


7) Extraction of Minerals Data

This data set contains the different amount of extraction of different sources of energy taken worldwide. Although it was a difficult data set to find for the whole world, it was also a hard one to tidy and here is why.

After running the four first line of code in order to deleat the unnecessary information lines and name the columns to be able to use it, we end up to this stage:

ExtractVar 1980 1981 1982 1983
Algeria
Production (quad Btu) 2.803017355026457 3.0375368604701833 3.224933778884667 3.6064004828010217
Coal (quad Btu) 0.0000759200610490114 0.00024041352665520277 0.00040490699226139416 0.0005694004578675855
Natural gas (quad Btu) 0.48498 0.91096 1.11156 1.55052
Petroleum and other liquids (quad Btu) 2.3153852101494885 2.122552459995088 2.1080136736977853 2.052859918102554
Nuclear, renewables, and other (quad Btu) 0.00257622481592 0.00378398694844 0.00495519819462 0.0024511642406
Nuclear (quad Btu)
Renewables and other (quad Btu) 0.00257622481592 0.00378398694844 0.00495519819462 0.0024511642406

Now we can see our problem: The name of the country appear as a title that is in the same column as the variables names we need. We are therefore at this stage unable to pivot our tibble. In order to solve this problem we created a new column “Country” where we take only one row every eights of the first column and the copy it for all the variable. Then we deleted the row corresponding at the country name that is now store in the column “Country” and obtain something like this:



Now we are finally able to pivot our tibble to have a tidy data set usable for the rest of our analysis:

Country Year Production (quad Btu) Coal (quad Btu) Natural gas (quad Btu) Petroleum and other liquids (quad Btu) Nuclear, renewables, and other (quad Btu) Nuclear (quad Btu) Renewables and other (quad Btu)
World 1980 296.21435254624544 79.9919425299794 54.761045594 133.11110886607943 28.350255556186585 7.575700462108056 20.77455509407853
World 1981 291.2691434621497 80.44275739096139 55.573536747 125.4389494908981 29.81389983329021 8.527153469041346 21.286746364248867
World 1982 290.16459397739266 83.44334341199846 55.495522664 119.76280547269802 31.46292242869618 9.50768642751678 21.9552360011794
World 1983 293.0884301960975 83.72816748576506 56.115951591 119.26919636784994 33.97511475148248 10.718344095623602 23.25677065585888
World 1984 308.88419722061724 87.63254400981725 61.758597532 122.55382660121982 36.93922907758015 12.994607878980153 23.9446211986
World 1985 316.3867487746443 91.60218713955003 64.124522824 121.1347896143825 39.52524919671177 15.298615144911766 24.2266340518
World 1986 326.69952978342883 93.99118261682449 65.3286482 126.54884517346557 40.830853793138765 16.247905140338766 24.5829486528


8) Population Growth Data

This data aims to measure the population growth derived from the total population. We’ll be using this variable to determine a potential link with the deforestation. In order to tidy it we got rid of the unnecessary lines and once we had our column name set to the years, we pivoted our tibble to have all years as one column namely one variable. The data is taken as percentage of growth compare to the previous year and are taken from year 1960 to 2020 across 266 different countries.
Country Year PopulationGrowth
Aruba 1980 0.208
Aruba 1981 0.769
Aruba 1982 1.280
Aruba 1983 1.412
Aruba 1984 0.981
Aruba 1985 0.315
Aruba 1986 -0.603


9) Openness to Trade Data

In order to capture whether a country is open to trade or not, the best proxy variable that we found we to take the ratio of export and import over de GDP, again taken each year for each country.


Country Year Export/Import of GDP
Afghanistan 1980 25.9
Afghanistan 1981 25.7
Afghanistan 1982 26.0
Afghanistan 1983 25.8
Afghanistan 1984 25.8
Afghanistan 1985 25.9
Afghanistan 1986 25.9


10) Soybeans dataset

This data set represent the amount of soybeans produced per year for each country.
Country Year Gross Production Value Soybeans
Albania 1993 151
Albania 1994 31
Albania 1995 44
Albania 1996 131
Albania 1997 76
Albania 1998 49
Albania 1999 523


11) Merged Data

Finally we are able to merge our different variables into a merged data set. The countries and years corresponding to each observation are taken as primary key for this data set.

Country Year AgriLand Change agricultural land Change agricultural land % Agricultural raw materials exports (% of merchandise exports) Food exports (% of merchandise exports) ForestArea Deforestation Deforestation % Land Area in sq km PopulationGrowth Gross Production Value Soybeans Export/Import of GDP Production (quad Btu) Coal (quad Btu) Natural gas (quad Btu) Petroleum and other liquids (quad Btu) Nuclear, renewables, and other (quad Btu) Nuclear (quad Btu) Renewables and other (quad Btu)
Albania 1996 11310 40 0.354 9.03 11.09 7771 -19.5 -0.003 27400 -0.622 131 48.7 0.08269987229832255 0.0010542168686930385 0.0011060658 0.021922129629629517 0.05861746 0.05861746
Albania 1997 11350 40 0.352 14.06 11.11 7752 -19.5 -0.003 27400 -0.625 76 50.0 0.07303457152971403 0.0003638447599914026 0.0007373772 0.02109303556972262 0.050840314 0.050840314
Albania 1998 11390 60 0.527 8.76 9.82 7732 -19.5 -0.003 27400 -0.629 49 52.6 0.06574110056542558 0.00045713828819432637 0.0011060658 0.014498112477231255 0.049679784 0.049679784
Albania 1999 11450 -10 -0.087 4.62 5.53 7712 -19.5 -0.003 27400 -0.633 523 55.5 0.06751718849892084 0.00045713828819432637 0.0007373772 0.012830467010726508 0.053492206 0.053492206
Albania 2000 11440 -50 -0.437 5.97 6.63 7693 12.8 0.002 27400 -0.637 292 63.5 0.06083183808831241 0.00027988058460877126 0.0011060658 0.01305174370370364 0.046394148 0.046394148
Albania 2001 11390 10 0.088 5.54 5.79 7706 12.8 0.002 27400 -0.938 389 66.5 0.05136107356866748 0.00019591640922613987 0.0011060658 0.013697264359441338 0.036361827 0.036361827
Albania 2002 11400 -190 -1.667 6.52 3.56 7719 12.8 0.002 27400 -0.300 200 68.5 0.051256018118918355 0.00013994029230438563 0.0011060658 0.014638491026613971 0.035371521 0.035371521


Exploratory data analysis

To get an overview of the variables and their distribution in this table the mean and the median for each of the variable is shown. Additionally we can also see the amount of observations which are not NA values and the proportion of the NA values per variable.
Summary Statistics
Variable NotNA Mean Median PropNA
agriland 1441 486836 142936 0
chagri 1441 -216.899 0 0
chagrip 1441 0.041 0 0
agriexp 1441 4.05 1.949 0
foodexp 1441 22.637 13.565 0
forest 1441 373573.327 88962.82 0
deforest 1441 -593.54 -31.986 0
deforestp 1441 -0.001 0 0
land 1441 1151848.046 298170 0
pop 1441 1.171 1.19 0
soy 1441 705496.72 8039 0
opentr 1441 67.545 60.192 0
prod 1441 3.515 0.515 0
coal 1441 1.631 0.004 0
gas 1441 0.431 0.012 0
petrol 1441 0.8 0.033 0
nuclrew 1441 0.653 0.101 0
nuc 461 0.665 0.153 0.68
rew 1441 0.44 0.073 0

If we look at the mean and the median it tells us something about the distribution of the variable.In a perfectly symmetrical distribution, the mean and the median are the same. If the mean is lower than the median the distribution of data is skewed to the left and if the mean is higher than the median the distribution of the data is skewed to the right.

Because we did not have data for all the countries of the word this are the countries we include in our survey:
Albania China Kazakhstan Rwanda
Angola Colombia Madagascar Serbia
Argentina Croatia Malawi Slovenia
Australia Ecuador Mali South Africa
Austria El Salvador Mexico Spain
Azerbaijan Ethiopia Morocco Sri Lanka
Bangladesh France Nepal Suriname
Belize Georgia Nicaragua Switzerland
Bhutan Germany Nigeria Tajikistan
Bosnia and Herzegovina Greece North Macedonia Thailand
Brazil Guatemala Pakistan Togo
Bulgaria Honduras Panama Turkey
Burkina Faso Hungary Paraguay Uganda
Burundi India Peru Ukraine
Cambodia Indonesia Philippines Uruguay
Cameroon Italy Poland Zambia
Canada Japan Romania Zimbabwe

Deforestation per country

Now lets have a look at the most important variable in our dataset which shows the amount of deforestation in sq km. First we show the affected countries in the year 2017:

Country Year deforest deforestp
Brazil 2017 -10402.000 -0.002
Indonesia 2017 -6055.300 -0.006
Angola 2017 -5550.600 -0.008
Paraguay 2017 -2793.400 -0.016
Colombia 2017 -1992.900 -0.003
Zambia 2017 -1882.100 -0.004
Nigeria 2017 -1633.000 -0.007
Peru 2017 -1587.400 -0.002
Cambodia 2017 -1556.900 -0.018
Mexico 2017 -1277.700 -0.002
Argentina 2017 -1080.000 -0.004
Nicaragua 2017 -1000.000 -0.027
Ethiopia 2017 -730.000 -0.004
Ecuador 2017 -642.700 -0.005
Cameroon 2017 -560.000 -0.003
Burkina Faso 2017 -500.000 -0.008
Zimbabwe 2017 -460.700 -0.003
Malawi 2017 -420.000 -0.018
Pakistan 2017 -413.400 -0.011
Canada 2017 -369.800 0.000
South Africa 2017 -364.000 -0.002
Thailand 2017 -360.000 -0.002
Honduras 2017 -209.500 -0.003
Madagascar 2017 -132.200 -0.001
Australia 2017 -123.000 0.000
Suriname 2017 -122.400 -0.001
Panama 2017 -114.100 -0.003
Belize 2017 -111.500 -0.009
El Salvador 2017 -45.000 -0.008
Sri Lanka 2017 -31.600 -0.001
Togo 2017 -29.600 -0.002
Slovenia 2017 -20.300 -0.002
Hungary 2017 -13.500 -0.001
Albania 2017 -0.025 0.000
Bosnia and Herzegovina 2017 0.000 0.000
Burundi 2017 0.000 0.000
Georgia 2017 0.000 0.000
Germany 2017 0.000 0.000
Greece 2017 0.000 0.000
Japan 2017 0.000 0.000
Mali 2017 0.000 0.000
Nepal 2017 0.000 0.000
North Macedonia 2017 0.000 0.000
Romania 2017 0.000 0.000
Rwanda 2017 10.000 0.004
Croatia 2017 25.016 0.001
Switzerland 2017 34.400 0.003
Austria 2017 35.900 0.001
Spain 2017 42.900 0.000
Ukraine 2017 70.000 0.001
Morocco 2017 100.000 0.002
Azerbaijan 2017 115.301 0.011
Poland 2017 120.000 0.001
Bulgaria 2017 130.000 0.003
Uruguay 2017 210.000 0.011
Kazakhstan 2017 292.412 0.009
Philippines 2017 348.800 0.005
Italy 2017 538.100 0.006
France 2017 834.000 0.005
Turkey 2017 1559.000 0.007
India 2017 2664.000 0.004
China 2017 18795.700 0.009

In 2017 the countries Brazil, Indonesia, Angola, Paraguay, Colombia, Zambia, Nigeria, Peru and Cambodia are most affected by deforestation while the forest area in China and India enlarged.
3 years before this was the amount of deforestation per country:

Country Year deforest deforestp
Brazil 2015 -18027.000 -0.004
Angola 2015 -5550.700 -0.008
Zambia 2015 -1883.000 -0.004
Peru 2015 -1842.500 -0.003
Colombia 2015 -1751.700 -0.003
Paraguay 2015 -1658.100 -0.009
Nigeria 2015 -1633.021 -0.007
Cambodia 2015 -1556.900 -0.018
Mexico 2015 -1277.600 -0.002
Nicaragua 2015 -1000.000 -0.026
Argentina 2015 -870.000 -0.003
Ethiopia 2015 -730.000 -0.004
Pakistan 2015 -643.600 -0.016
Ecuador 2015 -642.700 -0.005
Cameroon 2015 -560.000 -0.003
Burkina Faso 2015 -500.000 -0.008
Zimbabwe 2015 -460.700 -0.003
Thailand 2015 -440.000 -0.002
Malawi 2015 -420.000 -0.017
Canada 2015 -396.700 0.000
South Africa 2015 -364.000 -0.002
Honduras 2015 -207.800 -0.003
Madagascar 2015 -132.100 -0.001
Panama 2015 -114.100 -0.003
Suriname 2015 -113.500 -0.001
Belize 2015 -111.500 -0.008
El Salvador 2015 -45.000 -0.007
Japan 2015 -40.000 0.000
Sri Lanka 2015 -31.600 -0.001
Hungary 2015 -20.900 -0.001
Slovenia 2015 -20.300 -0.002
Albania 2015 -0.075 0.000
Greece 2015 -0.030 0.000
Bangladesh 2015 0.000 0.000
Burundi 2015 0.000 0.000
Georgia 2015 0.000 0.000
Germany 2015 0.000 0.000
Nepal 2015 0.000 0.000
Turkey 2015 0.000 0.000
Rwanda 2015 20.000 0.007
Croatia 2015 21.202 0.001
Switzerland 2015 34.400 0.003
Austria 2015 36.000 0.001
Spain 2015 40.600 0.000
Ukraine 2015 70.000 0.001
North Macedonia 2015 72.700 0.007
Bulgaria 2015 80.000 0.002
Azerbaijan 2015 95.882 0.009
Morocco 2015 151.000 0.003
Bosnia and Herzegovina 2015 160.100 0.007
Uruguay 2015 270.000 0.014
Romania 2015 280.900 0.004
Kazakhstan 2015 292.538 0.009
Philippines 2015 348.900 0.005
Italy 2015 538.100 0.006
France 2015 834.000 0.005
Indonesia 2015 2439.000 0.003
India 2015 2664.000 0.004
Australia 2015 9427.000 0.007
China 2015 21656.170 0.010

Already in 2015 Brazil Angola Zambia, Peru, Cambodia, Paraguay, Colombia, Nigeria are most affected by deforestation and the forest area in China and India enlarged. And in 2015 also the forest area of Australia and Indonesia enlarged.
If we again go 5 years more in the past we can see the following countries affected by deforestation:

Country Year deforest deforestp
Brazil 2010 -15391.80 -0.003
Indonesia 2010 -9262.60 -0.009
Angola 2010 -5550.62 -0.008
Paraguay 2010 -4142.54 -0.021
Cambodia 2010 -3484.82 -0.033
Argentina 2010 -2234.00 -0.007
Zambia 2010 -1881.80 -0.004
Peru 2010 -1710.54 -0.002
Nigeria 2010 -1633.06 -0.007
Colombia 2010 -1346.26 -0.002
Mexico 2010 -1224.80 -0.002
Ethiopia 2010 -730.00 -0.004
Nicaragua 2010 -561.24 -0.013
Cameroon 2010 -560.00 -0.003
Burkina Faso 2010 -500.20 -0.007
Zimbabwe 2010 -460.70 -0.003
Malawi 2010 -420.00 -0.016
Ecuador 2010 -418.02 -0.003
Canada 2010 -413.00 0.000
South Africa 2010 -364.00 -0.002
Pakistan 2010 -322.26 -0.008
Honduras 2010 -223.28 -0.003
Madagascar 2010 -132.18 -0.001
Belize 2010 -117.12 -0.008
Panama 2010 -114.16 -0.003
Suriname 2010 -96.38 -0.001
El Salvador 2010 -45.00 -0.007
Japan 2010 -44.00 0.000
Thailand 2010 -24.00 0.000
Bangladesh 2010 -9.88 -0.001
Georgia 2010 0.00 0.000
Greece 2010 0.00 0.000
Mali 2010 0.00 0.000
Nepal 2010 0.00 0.000
Slovenia 2010 2.00 0.000
Croatia 2010 4.00 0.000
Rwanda 2010 10.00 0.004
Albania 2010 14.23 0.002
Bhutan 2010 19.78 0.001
Germany 2010 20.00 0.000
Morocco 2010 20.24 0.000
Hungary 2010 28.86 0.001
Switzerland 2010 34.38 0.003
Austria 2010 35.98 0.001
Sri Lanka 2010 50.40 0.002
North Macedonia 2010 67.94 0.007
Azerbaijan 2010 90.78 0.009
Bosnia and Herzegovina 2010 115.68 0.006
Burundi 2010 171.40 0.088
Poland 2010 182.00 0.002
Bulgaria 2010 192.00 0.005
Ukraine 2010 218.00 0.002
Philippines 2010 348.86 0.005
Uruguay 2010 377.40 0.022
Kazakhstan 2010 452.55 0.015
Italy 2010 538.08 0.006
Romania 2010 771.92 0.012
France 2010 834.00 0.005
Turkey 2010 1094.44 0.005
India 2010 2664.00 0.004
Australia 2010 7096.80 0.005
China 2010 19367.74 0.010

And also in 2010 Brazil, Angola, Paraguay, Cambodia, Zambia, Peru, Nigeria and Colombia are most affected by deforestation and the forest area of India, China and Australia enlarged. In 2010 also Indonesia is affected and Argentina is affected more than the years before by deforestation.
Having a look at the year 2005 that the map of deforestation:

and the table for 2005:
Country Year deforest deforestp
Brazil 2005 -39507.90 -0.007
Paraguay 2005 -3421.44 -0.016
Argentina 2005 -3164.00 -0.010
Australia 2005 -2268.00 -0.002
Colombia 2005 -1927.71 -0.003
Indonesia 2005 -1620.80 -0.002
Mexico 2005 -1438.04 -0.002
Peru 2005 -1248.03 -0.002
Nicaragua 2005 -1211.17 -0.025
Ethiopia 2005 -730.00 -0.004
Ecuador 2005 -702.31 -0.005
Cameroon 2005 -697.01 -0.003
Burkina Faso 2005 -500.00 -0.007
Canada 2005 -479.76 0.000
Philippines 2005 -469.54 -0.007
Madagascar 2005 -468.69 -0.004
Malawi 2005 -420.00 -0.015
Pakistan 2005 -417.53 -0.010
South Africa 2005 -364.00 -0.002
Zambia 2005 -358.00 -0.001
Honduras 2005 -202.80 -0.003
Cambodia 2005 -191.77 -0.002
Panama 2005 -114.15 -0.003
Kazakhstan 2005 -74.75 -0.002
Belize 2005 -67.91 -0.005
Sri Lanka 2005 -62.80 -0.003
El Salvador 2005 -45.00 -0.007
Suriname 2005 -41.01 0.000
Bangladesh 2005 -31.99 -0.002
Rwanda 2005 -22.00 -0.008
Bosnia and Herzegovina 2005 -8.99 0.000
Burundi 2005 0.00 0.000
Mali 2005 0.00 0.000
North Macedonia 2005 2.88 0.000
Albania 2005 12.77 0.002
Slovenia 2005 14.00 0.001
Austria 2005 25.06 0.001
Croatia 2005 35.00 0.002
Ukraine 2005 38.00 0.000
Switzerland 2005 38.54 0.003
Azerbaijan 2005 45.27 0.004
Georgia 2005 61.80 0.002
Japan 2005 90.00 0.000
Bhutan 2005 99.29 0.004
Hungary 2005 125.22 0.006
Romania 2005 149.00 0.002
Morocco 2005 168.03 0.003
Poland 2005 270.00 0.003
Greece 2005 301.57 0.008
Bulgaria 2005 362.00 0.010
Uruguay 2005 362.30 0.023
Italy 2005 658.79 0.008
Turkey 2005 934.73 0.005
Thailand 2005 1075.00 0.006
France 2005 1131.00 0.007
Spain 2005 1451.41 0.008
India 2005 1905.00 0.003
China 2005 23609.83 0.013

Also in 2005 year the most affected countries are Brazil, Paraguay, Argentina, Colombia and Indonesia and the forest aria in China and India already anlarged. In 2005 also Australia was affected by deforestation.
This last map shows the deforestation in sq km in the year 2001:

Country Year deforest deforestp
Brazil 2001 -39507.90 -0.007
Paraguay 2001 -3421.44 -0.015
Argentina 2001 -3164.00 -0.010
Australia 2001 -2268.00 -0.002
Colombia 2001 -1927.71 -0.003
Nigeria 2001 -1633.06 -0.007
Indonesia 2001 -1620.80 -0.002
Mexico 2001 -1438.04 -0.002
Peru 2001 -1248.03 -0.002
Nicaragua 2001 -1211.17 -0.023
Ethiopia 2001 -730.00 -0.004
Ecuador 2001 -702.31 -0.005
Cameroon 2001 -697.01 -0.003
Burkina Faso 2001 -500.00 -0.007
Canada 2001 -479.76 0.000
Philippines 2001 -469.54 -0.006
Madagascar 2001 -468.69 -0.004
Pakistan 2001 -417.53 -0.009
South Africa 2001 -364.00 -0.002
Honduras 2001 -202.80 -0.003
Cambodia 2001 -191.77 -0.002
Panama 2001 -114.15 -0.003
Kazakhstan 2001 -74.75 -0.002
Belize 2001 -67.91 -0.005
Sri Lanka 2001 -62.80 -0.003
El Salvador 2001 -45.00 -0.007
Suriname 2001 -41.01 0.000
Rwanda 2001 -22.00 -0.008
Burundi 2001 0.00 0.000
Mali 2001 0.00 0.000
North Macedonia 2001 2.88 0.000
Albania 2001 12.77 0.002
Slovenia 2001 14.00 0.001
Austria 2001 25.06 0.001
Croatia 2001 35.00 0.002
Ukraine 2001 38.00 0.000
Switzerland 2001 38.54 0.003
Azerbaijan 2001 45.27 0.005
Germany 2001 55.00 0.000
Georgia 2001 61.80 0.002
Japan 2001 90.00 0.000
Hungary 2001 125.22 0.006
Romania 2001 149.00 0.002
Morocco 2001 168.03 0.003
Greece 2001 301.57 0.008
Bulgaria 2001 362.00 0.011
Uruguay 2001 362.30 0.026
Italy 2001 658.79 0.008
Turkey 2001 934.73 0.005
Thailand 2001 1075.00 0.006
France 2001 1131.00 0.007
Spain 2001 1451.41 0.008
India 2001 1905.00 0.003
China 2001 23609.83 0.013

And already in 2001 Brazil, Paraguay, Argentina, Colombia, Nigeria and Indonesia were the countries which are most affected by deforestation and the forest area in China and India enlarged.

Now we have identified th most interesting countries regarding the amount of yearly deforestation and have a deeper look at them.

The yearly deforestation in Brazil has the highest amount of the world, also if in 2010 the amount decreases to ca. 15000 sq km and in 2017 the yearly amount also decreases to 10402 sq km.

The yearly amount of deforestation in Indonesia increased in 2010 from 1620 sq km to 9262 sq km. In the year 2015 the forest area enlarged by 2439 sq km, but one year later the country was already affected by deforestation of 13220 sq km.

In Angola the yearly amount of deforestation was around 5550 from 2010 to 2017. For the years before 2010 we do not have any data in our dataset.

The yearly deforestation amount in Paraguay increased in 2010 from 3421 to 4142 sq km. In 2015 the amount of deforestation was just 1658 but one year later it was already very high with 3928 sq km.

In Colombia the amount of yearly deforestation decreased in 2010 from around 2000 sq km to 1300 sq km an increased in 2016 again to above 2000 sq km.

In Zambia we can see a increase of the yearly amount of deforestation between the year 2009 and 2010 from around 300 to 1800 sq km.

In Nigeria the yearly amount of deforestation was nearly the same in the years from 2001 to 2017.

In Peru the yearly amount of deforestation in the years 2001 to 2009 was around 1200 and from 2010 to 2017 the amount was between 1500 and 1800 sq km.

In Cambodia the yearly amount deforestation decreased between 2009 and 2010 from under 200 sq km to above 3000 sq km and decreased in the year 2015 to around 1500 sq km.

In Argentina the yearly amount of deforestation decreased in 2009 from around 3000 to around 2000 sq km and in 2015 to around 1000 sq km.

Australia was from 2001 to 2009 affected by deforestation of above 2000 sq km each year, from the years 2010 to 2015 the forest area was enlarged but in the years 2016 and 2017 the country was again affected by deforestation even if just by a low extend.
Now we again visualized the distribution of the deforestation variable with a boxplot: We can see that the interquartile range is very small, that is because we have a lot of far away outliers. We also already know which countries most of this this outliers are. The lowest values for example are China, India and in 2010 and 2015 also Australia while the highest values in for example are Brazil, Paraguay and Angola. At first we were sceptical about the high increase of China and India but after a research we came across a paper that deals with the subject of the leading role of China and India in the greening of the world.(Chen, 2019) To put at the end of the report: Chen et al. (2019) China and India lead in greening of the world through land-use management. Nature Sustainability, (2) 122–129. In our analysis we are just interested in the absolute amount of deforestation therefore we do not take the percentage of deforestation into account. We know that using the absolute values does not help to exam the deforestation without the dependence to the amount of forest area of the country but we think that the amount of forest area also plays an important role in real word decisions regarding deforestation.

The other problem is that in our variable of interest (deforest) we have two opposite effect included. One is the deforestation and the other is the fact that some countries adopted a Green policy and planted trees in order to reforest their country. Our goal with this regression analysis is to identify the factors of the deforestation. As both effects are included in the same variable (deforest) it will be hard to differentiate them. However we could filter to take into account only the countries that had a positive value of deforestation and set the others equal to zero in order to not lose the total observation.

After this data transformation our boxplot looks like this: Now we do not have any outliers below the box plot, the outliers above we will keep because this are the observations which are the most interesting ones.

Distribution of aricultural land

Also the variable agricultural land has a lot of outiers, and in all years the median is lower than the mean because the box plots are located at the bottom of the plot. Between the years we can not see a important change of the agricultural land. Most of the values are between 0 and 100000 sq km. Now we will have a look at some outlies with the most difference to the rest of the samples in each year to better understand and proof the values of the data.
Outliers 1995
Country agriland
China 5237140
Australia 4633880
Brazil 2278050
Kazakhstan 2171865
India 1809450
Argentina 1280450
Mexico 1061950
South Africa 975200
Canada 612610
Colombia 445130
Indonesia 429780
Outliers 2000
Country agriland
China 5237310
Australia 4554690
Brazil 2283235
Kazakhstan 2153933
India 1809750
Argentina 1285100
Mexico 1063300
South Africa 981250
Nigeria 661840
Canada 612870
Indonesia 471770
Colombia 448590
Outliers 2005
Country agriland
China 5266822
Australia 4102300
Brazil 2288420
Kazakhstan 2122860
India 1801260
Argentina 1377975
Mexico 1065700
South Africa 974830
Canada 616560
Indonesia 518460
Colombia 425570
Outliers 2010
Country agriland
China 5289168
Australia 3763720
Brazil 2318342
Kazakhstan 2171618
India 1795730
Argentina 1474810
Mexico 1028440
South Africa 968910
Nigeria 678170
Canada 582800
Indonesia 556000
Angola 525120
Colombia 425030
Outliers 2015
Country agriland
China 5286334
Australia 3481190
Brazil 2354382
Kazakhstan 2162597
India 1796740
Argentina 1487000
Mexico 1032120
South Africa 963410
Nigeria 686336
Canada 579850
Indonesia 573000
Angola 552873
Colombia 447539

In this tables we see that the outliers are mostly the same and most of this countries we already know well from the exploration of the amount of deforestation. China and India have a high amount of agricultural land and in the same years the forest area of this countries enlarged. The countries Colombia, Angola, Nigeria, Argentina, Brazil and Indonesia also have a high amount of agricultural land and were the countries most affected by deforestation. Also Australia has a high amount of agricultural land and was eigher affected by deforestation or enlarged its forest area.

Distibution of agricultural raw material export

Also the variable agricultural raw material exports has some outiers, and in all years the median is lower than the mean because the box plots are located at the bottom of the plot. Between the years we can not see a important change of the agricultural land. Most of the values are between 0 and 10 %. Now we will have a look at some outlies with the most difference to the rest of the samples in each year to better understand and proof the values of the data.
Outliers 1995
Country agriexp
Burkina Faso 62.7
Paraguay 36.4
Cameroon 27.5
Uruguay 14.9
Ethiopia 13.4
Outliers 2000
Country agriexp
Burkina Faso 58.0
Mali 36.1
Ethiopia 17.6
Tajikistan 12.2
Outliers 2005
Country agriexp
Burkina Faso 74.7
Mali 24.5
Cameroon 19.3
Albania 13.9
Ethiopia 12.9
Outliers 2010
Country agriexp
Burkina Faso 17.5
Cameroon 14.8
Uruguay 10.5
Outliers 2015
Country agriexp
Cameroon 16.9
Burkina Faso 13.7
Uruguay 13.5

In this tables we can see that the outliers are mostly the same like Burkina Faso, Cameroon, Uruguay, Ethiopia and Mali.

Distribution of poplutatio growth

In this plot we can see that that the median and the mean of this variable is mostly the same because the boxplot is located in the middle of the plot. Between 1995 and 2000 the median of the popluation growth decreased by cs. 0,25 % and the population growth of all the countries is between -2 and 3,5 %. For this variable we do not have any outliers.

Distribution of soy production

The variable soy production has a lot of outiers, and in all years the median is lower than the mean because the box plots are located at the bottom of the plot. Between the years we can also see that the values of the outliers are growing. Now we will have a look at some outlies with the most difference to the rest of the samples in each year to better understand and proof the values of the data.
Outliers 1995
Country soy
Brazil 4533712
China 3335324
Argentina 2633388
India 1413272
Indonesia 845549
Canada 483808
Outliers 2000
Country soy
Brazil 5131057
China 4429960
Argentina 3626257
India 1015444
Paraguay 555606
Japan 523359
Outliers 2005
Country soy
Brazil 10180069
China 6563619
Argentina 6527335
India 1894838
Canada 716117
Outliers 2010
Country soy
Brazil 24751814
Argentina 13838818
China 11139314
India 5035321
Paraguay 2528581
Canada 1656694
Uruguay 681203
Indonesia 669781
Outliers 2015
Country soy
Brazil 30907303
Argentina 12940965
China 9524513
India 3141986
Paraguay 2855165
Canada 2128292
Ukraine 1341227
Uruguay 1040094
Indonesia 599012

In most of the years the outliers are Brazil, China, Argentina, India, Canada, Paraguay, Indonesia and Uruguay, because these countries produced most soy.

Distribution of openess to trade

In this plot we can see that the median of tha ratio of imports and exports in percent of the GDP is growing from 1995 to 2005. Than countries ratios of imports and exports are spread bewtween 25 and 150 % of GDP. Now we will have a look on the countries with the 10 highest values in each year to get an impression which countries are the most open to trade.
Countries that are most opened to trade in 1995
Country opentr
Panama 175.5
Suriname 140.8
Paraguay 130.7
Honduras 114.9
Belize 105.6
Slovenia 93.5
Thailand 89.8
Kazakhstan 82.5
Hungary 78.3
Switzerland 77.1
Countries that are most opened to trade in 2000
Country opentr
Tajikistan 193
Panama 140
Hungary 137
Belize 127
Thailand 121
Honduras 120
Ukraine 116
Cambodia 112
Suriname 109
Kazakhstan 106
Countries that are most opened to trade in 2005
Country opentr
Panama 142
Thailand 138
Cambodia 137
Honduras 136
Hungary 127
Slovenia 120
Belize 117
Azerbaijan 116
Suriname 111
Paraguay 104
Countries that are most opened to trade in 2010
Country opentr
Hungary 158
Panama 148
Thailand 127
Slovenia 127
Switzerland 117
Belize 116
Cambodia 114
Bhutan 113
Honduras 109
Paraguay 107
Countries that are most opened to trade in 2015
Country opentr
Hungary 170
Slovenia 145
Bulgaria 128
Cambodia 128
Thailand 126
Belize 124
North Macedonia 114
Switzerland 113
Ukraine 108
Honduras 107

We can see that Panama, Suriname, Honduras, Belize, Thailand, Kazakhstan, Hungary, Cambodia, Honduras, Slovenia, Paraguay, Switzerland and Ukraine are in many years under 10 most opended to trade countries.

Distribution of coal production

The variable coal production has a lot of outiers, and in all years the median is lower than the mean because the box plots are located at the bottom of the plot. Between the years we can also see that the values of the outliers are growing and that most of the values are near to 0 quad. Now we will have a look at some outlies with the most difference to the rest of the samples in each year to better understand and proof the values of the data.
Outliers in 1995
Country coal
China 28.779
Australia 5.262
India 4.695
South Africa 4.389
Germany 3.418
Kazakhstan 2.146
Canada 1.587
Indonesia 0.879
Colombia 0.655
Spain 0.503
Outliers in 2000
Country coal
China 30.828
Australia 6.533
India 5.396
South Africa 4.929
Germany 2.579
Kazakhstan 1.990
Indonesia 1.783
Ukraine 1.545
Canada 1.464
Colombia 0.966
Outliers in 2005
Country coal
China 52.67
Australia 7.99
India 7.03
South Africa 5.37
Indonesia 3.79
Poland 2.72
Kazakhstan 2.20
Colombia 1.50
Ukraine 1.50
Canada 1.39
Outliers in 2010
Country coal
China 76.36
Australia 9.17
India 8.62
Indonesia 5.74
South Africa 5.58
Kazakhstan 2.95
Poland 2.24
Germany 1.93
Colombia 1.88
Canada 1.44
Outliers in 2015
Country coal
China 85.452
Australia 11.562
India 9.587
Indonesia 9.357
South Africa 5.548
Kazakhstan 2.908
Colombia 2.226
Germany 1.796
Canada 1.269
Ukraine 0.831

We can see that the outlierwith the highes production value is every year China. Australia, India, South Africa, Germany, Kazakhstan, Indonesia, Canada, Colombia, Ukraine, Poland are in most of the years outliers with the most difference to the other values.

Distribution of gas production

The variable gas production has a lot of outiers, and in all years the median is lower than the mean because the box plots are located at the bottom of the plot. Most of the values are below 1 quad. Now we will have a look at some outlies with the most difference to the rest of the samples in each year to better understand and proof the values of the data.
Outliers in 1995
Country gas
Canada 5.72
Indonesia 2.44
Outliers in 2000
Country gas
Canada 6.60
Indonesia 2.44
Argentina 1.38
Mexico 1.36
Outliers in 2005
Country gas
Canada 6.738
Indonesia 2.181
China 1.842
Argentina 1.684
Mexico 1.553
Australia 1.382
India 1.092
Pakistan 0.944
Outliers in 2010
Country gas
Canada 5.57
China 3.48
Indonesia 3.18
Australia 1.96
India 1.94
Mexico 1.85
Argentina 1.48
Thailand 1.25
Pakistan 1.21
Outliers in 2015
Country gas
China 4.904
Australia 2.503
India 1.155
Indonesia 2.802
South Africa 0.041

We can see that in most years the outliers are the countries Canada, Indonesia, Argentina, Mexico, Australia, India and Pakistan.

Distribution of petrol production

The variable petrol production has a lot of outiers, and in all years the median is lower than the mean because the box plots are located at the bottom of the plot. Most of the values are below 2 quad. Now we will have a look at some outlies with the most difference to the rest of the samples in each year to better understand and proof the values of the data.
Outliers in 1995
Country petrol
China 6.42
Mexico 6.41
Canada 4.67
Indonesia 3.26
Argentina 1.62
Outliers in 2000
Country petrol
Mexico 7.29
China 6.99
Canada 5.22
Nigeria 4.66
Indonesia 3.13
Brazil 2.80
Argentina 1.74
Australia 1.58
Kazakhstan 1.56
Outliers in 2005
Country petrol
Mexico 7.95
China 7.75
Canada 5.96
Brazil 3.65
Kazakhstan 2.83
Indonesia 2.26
Argentina 1.72
India 1.58
Ecuador 1.17
Colombia 1.16
Australia 1.07
Outliers in 2010
Country petrol
China 8.76
Canada 6.68
Mexico 6.19
Nigeria 5.25
Brazil 4.56
Angola 4.08
Kazakhstan 3.40
Azerbaijan 2.23
Indonesia 2.05
India 1.76
Colombia 1.73
Argentina 1.51
Australia 1.08
Outliers in 2015
Country petrol
China 9.19
Canada 8.82
Mexico 5.43
Brazil 5.40
Nigeria 4.73
Angola 3.87
Kazakhstan 3.69
Colombia 2.21
Azerbaijan 1.83
India 1.77
Indonesia 1.71

The countries China, Mexico, Canada, Indonesia, Argentina, Brazil, Kazakhstan, Australia, Nigeria, India, Colombia, Angola and Azerbaijan are the outliers in most of the years.

Distibution of nuclear production

The variable nuclear production has not that much outiers than the other variables, and in all years the median is lower than the mean because the box plots are located at the bottom of the plot. Most of the values are below 2 quad. Now we will have a look at the few outlies in each year to better understand and proof the values of the data.
Outliers in 1995
Country nuc
France 3.71
Japan 2.83
Outliers in 2000
Country nuc
France 4.08
Japan 3.11
Outliers in 2005
Country nuc
France 4.46
Japan 2.85
Outliers in 2010
Country nuc
France 4.26
Japan 2.85
Outliers in 2015
Country nuc
France 4.35

` The two outlier we can see in all of the years are France and Japan.

Distribution of production of renewables ant others

The variable renewables production again has some outliers, and in all years the median is lower than the mean because the box plots are located at the bottom of the plot. Most of the values are below 1 quad. Now we will have a look at the outlies in each year to better understand and proof the values of the data.
Outliers in 1995
Country rew
Canada 3.49
Brazil 2.94
China 1.94
Japan 1.00
Outliers in 2000
Country rew
Canada 3.71
Brazil 3.39
China 2.28
Japan 1.28
Outliers in 2005
Country rew
China 4.01
Brazil 3.84
Canada 3.69
Japan 1.25
India 1.16
Outliers in 2010
Country rew
China 7.720
Brazil 4.853
Canada 3.635
India 1.450
Japan 1.392
Germany 1.229
France 0.876
Italy 0.802
Outliers in 2015
Country rew
China 13.118
Brazil 4.727
Canada 3.938
Germany 1.952
India 1.818
Japan 1.710
Italy 1.057
France 0.989

Canada, Brazil, China, Japan, India, Germany, Italy and France are the outliers in most of the years.

Correlation between the different variables

We want to show which factors have influence on the amount of deforestation, therefore we have a look on the correlation coefficients that show if and which correlation is between the deforestation and the other variables we have in our data set. The correlation coefficient of all the variables to each other we can see in this correlation plot.

We are especially interested in the correlation coefficient between the variable deforestation and all the other variables,which we can see in this table:
deforest
Country -0.17
Year -0.08
agriland 0.25
chagri 0.01
chagrip 0.03
agriexp 0.01
foodexp 0.05
forest 0.74
deforest 1.00
deforestp -0.19
land 0.43
pop 0.07
soy 0.46
opentr -0.21
prod 0.06
coal -0.03
gas 0.07
petrol 0.23
nuclrew 0.22
nuc -0.15
rew 0.32

It may be that not all of these correlation coefficients are significant, therefore we show an additional correlation plot that gives us the information if there is a significant correlation between the variables.

deforest corr sign
agriexp 0.01 0.609
agriland 0.25 0.000
chagri 0.01 0.655
chagrip 0.03 0.252
coal -0.03 0.290
Country -0.17 0.000
deforest 1.00 0.000
deforestp -0.19 0.000
foodexp 0.05 0.082
forest 0.74 0.000
gas 0.07 0.007
land 0.43 0.000
nuc -0.15 0.002
nuclrew 0.22 0.000
opentr -0.21 0.000
petrol 0.23 0.000
pop 0.07 0.005
prod 0.06 0.031
rew 0.32 0.000
soy 0.46 0.000
Year -0.08 0.004

Like we can see in this table there is a significant positive correlation between deforestation and agricultural land, gas production, petrol production, population growth, production of reweables and soy production and a negative significant correlation between nuclear production and openess to trade. The significant correlation between deforestation and country is not of interest for us because the variable country is just encoded by alphabetic order and therefore does not give us a lot of information. The negative correlation between deforestation and deforestation in percent comes from the transformation of of the deforesation variable from negative to positive values and is for our porpuse not of interest. Of course we are also interested in the correlation coefficients between other variables than deforrstation to be aware of multicollinearity problems. This we will cover in the analysis part, because like we can see in the correlation plot there is a significant correlation between agricultural land and soy production, gas production, openess to trade, petrol production, nuclear production and production of reweables. This is also the case for more variables.

Correlation between deforestation and agricultural land

In this plot we can see a light tendency from the left bottom to right above, this shows the low positive correlation between agriccultural land and deforestation.

Correlation between deforestation and gas production

In this plot we can see a light tendency from the left bottom to right above, this shows the low positive correlation between agriccultural land and deforestation also if this tendency like also the correlation is very low because we can also see that there are a lot observationsat the right bottom and in the left upper corner.

Correlation between deforestation and nuclear production

#> Warning: Removed 980 rows containing missing values (geom_point).

Between this two variables is a negativ correlation, this correlation is very low therfore it is hard tho show in a scatter plot.

Correlation between deforestation and openess to trade

In this plot we can see that there is a light tendency from the left upper corner to the right lower corner, but again the correlation is very low.

Correlation between deforestation and petrol production

In this plot we can see a light tendency from the left lower corner to right upper, this shows the low positive correlation between petrol production and deforestation.

Correlation between deforestation and population growth

Between this two variables is a positive correlation, this correlation is very low therfore it is hard tho show in a scatter plot because we can see that most of the observations are in the lower middle.

Correlation between deforestation and soy production

In this plot we can see a light tendency from the left bottom to right above, this shows the moderate positive correlation between soy production and deforestation.

Correlation between deforestation and renewables production

In this plot we can see a light tendency from the left bottom to right above, this shows the positive correlation between renewables production and deforestation.

Analysis

Structure of the analysis

  1. Ordinary Least Square regression
  2. Model Selection
  3. Heteroskedasticity
  4. Panel Data Model
  5. Balanced Model
  6. OLS per year
  7. Comparison of the results
  8. Answer to the research question

1. OLS regression

We can now finally try to run the regression with our initial model. We will first try to run a basic OLS regression with deforestation as our dependent variable; our explanatory variables being : * the agricultural land (agriland) * the agricultural export (agriexp) * the food exportation (foodexp) * the population growth (pop) * the soy production (soy) * the openness to trade (opentr) * and all the extraction of minerals variables: ** total mineral production (prod) ** the coal production (coal) ** the gas production (gas) ** the petrol production (petrol) ** the nuclear production (nuc) ** the renewable production (rew)

All our variables are measured by country and each year. First we will use the “merge” data set. In this sample, the years are going from 1980 to 2017

So here is our first OLS regression of an initial model including almost all variables.

Intitial model

Observations 461 (980 missing obs. deleted)
Dependent variable deforest
Type OLS linear regression
F(11,449) 102.29
0.71
Adj. R² 0.71
Est. S.E. t val. p
(Intercept) -4690.10 948.69 -4.94 0.00
agriland 0.00 0.00 7.64 0.00
agriexp 843.89 175.09 4.82 0.00
foodexp 149.14 23.70 6.29 0.00
pop 397.68 274.20 1.45 0.15
soy -0.00 0.00 -7.33 0.00
opentr 18.64 8.24 2.26 0.02
prod 5423.76 254.35 21.32 0.00
coal -6100.78 277.79 -21.96 0.00
gas -9691.62 454.34 -21.33 0.00
petrol -4869.97 313.80 -15.52 0.00
nuclrew NA NA NA NA
nuc -6394.03 384.86 -16.61 0.00
rew NA NA NA NA
Standard errors: OLS

2. Model selection

Before proceeding with the model selection, we will comment the result of the OLS regression. Our R-squared is quite high which indicated that our model is explaining most of our model, this result is very optimistic and will prove to be unrealistic. In fact, in this first regression, the data are explaining 70.8% of our model. As we included in this first regression all our variable without any selection, it is not so surprising to have a high R-squared.

The dependent variable of our model, deforest, is now negative when the forest area increased from one year to the next and positive when the area of forest decreased, in order to get the deforestation. Now we will check at the sign of our estimators to see whether it goes in line with our expectation or not.For example, agriland coefficient is of positive sign, which is intuitively good as we picked this variable to see if the agriculture increases the deforestation. In fact they are all supposed to increase if the deforestation increases and thus, be of positive sign. We notice that the soy production and all the extraction of minerals coefficients are not of the sign expected. This is probably due to the fact that we have the total production as well as the production of each minerals separately. There is then a strong similarity of information and thus a multicolinearity issue.

Multicollinearity problem occurs when two or more variables are strongly correlated one another. The issue comes with the fact that we can therefore not isolate the effect of each variable. For the prediction it doesn’t matter much but for the measurement of influence of every factor it matters. We can see it here through a sign of the coefficients that make no sense.

In order to solve this strong multicollinearity issue we need to make a choice on which variable would be the most significant to keep and which we should remove.To measure the similarity of information we will use the VIF indicator:

#>    Variables Tolerance  VIF
#> 1   agriland     0.121 8.26
#> 2    agriexp     0.505 1.98
#> 3    foodexp     0.478 2.09
#> 4        pop     0.610 1.64
#> 5        soy     0.338 2.96
#> 6     opentr     0.468 2.14
#> 7       prod     0.000  Inf
#> 8       coal     0.000  Inf
#> 9        gas     0.000  Inf
#> 10    petrol     0.000  Inf
#> 11   nuclrew     0.000  Inf
#> 12       nuc     0.000  Inf
#> 13       rew     0.000  Inf

The VIF indicates the presence of multicollinearity if it is above 5. We will remove one by one the variables with the strongest VIF. Intuitively the variable “prod” contains all the extraction variables so it is obvious not to use them both in our model. As we want each factor isolated we will first remove prod from the initial model and re-run the VIF indicators.

Second model

Observations 461 (980 missing obs. deleted)
Dependent variable deforest
Type OLS linear regression
F(11,449) 102.29
0.71
Adj. R² 0.71
Est. S.E. t val. p
(Intercept) -4690.10 948.69 -4.94 0.00
agriland 0.00 0.00 7.64 0.00
agriexp 843.89 175.09 4.82 0.00
foodexp 149.14 23.70 6.29 0.00
pop 397.68 274.20 1.45 0.15
soy -0.00 0.00 -7.33 0.00
opentr 18.64 8.24 2.26 0.02
coal -677.02 40.47 -16.73 0.00
gas -4267.87 299.00 -14.27 0.00
petrol 553.79 177.86 3.11 0.00
nuclrew 5423.76 254.35 21.32 0.00
nuc -6394.03 384.86 -16.61 0.00
rew NA NA NA NA
Standard errors: OLS
#>    Variables Tolerance  VIF
#> 1   agriland     0.121 8.26
#> 2    agriexp     0.505 1.98
#> 3    foodexp     0.478 2.09
#> 4        pop     0.610 1.64
#> 5        soy     0.338 2.96
#> 6     opentr     0.468 2.14
#> 7       coal     0.134 7.47
#> 8        gas     0.203 4.93
#> 9     petrol     0.182 5.51
#> 10   nuclrew     0.000  Inf
#> 11       nuc     0.000  Inf
#> 12       rew     0.000  Inf

Now we notice “nuclrew” to have the highest VIF. Again, not surprising as it contains both “nuc” and “rew” information. We will remove it and re-run

Third model

Observations 461 (980 missing obs. deleted)
Dependent variable deforest
Type OLS linear regression
F(11,449) 102.29
0.71
Adj. R² 0.71
Est. S.E. t val. p
(Intercept) -4690.10 948.69 -4.94 0.00
agriland 0.00 0.00 7.64 0.00
agriexp 843.89 175.09 4.82 0.00
foodexp 149.14 23.70 6.29 0.00
pop 397.68 274.20 1.45 0.15
soy -0.00 0.00 -7.33 0.00
opentr 18.64 8.24 2.26 0.02
coal -677.02 40.47 -16.73 0.00
gas -4267.87 299.00 -14.27 0.00
petrol 553.79 177.86 3.11 0.00
nuc -970.28 229.11 -4.23 0.00
rew 5423.76 254.35 21.32 0.00
Standard errors: OLS
#>    Variables Tolerance  VIF
#> 1   agriland     0.121 8.26
#> 2    agriexp     0.505 1.98
#> 3    foodexp     0.478 2.09
#> 4        pop     0.610 1.64
#> 5        soy     0.338 2.96
#> 6     opentr     0.468 2.14
#> 7       coal     0.134 7.47
#> 8        gas     0.203 4.93
#> 9     petrol     0.182 5.51
#> 10       nuc     0.666 1.50
#> 11       rew     0.169 5.92

We now have “agriland” with the highest VIF. By checking the correlation plot we see that “agriland” is mainly correlated with some extraction variables. We will not remove agriland as it is one of the variable we are most interested in. We will then remove “coal” first as it has the second highest VIF.

Fourth model

Observations 461 (980 missing obs. deleted)
Dependent variable deforest
Type OLS linear regression
F(10,450) 52.19
0.54
Adj. R² 0.53
Est. S.E. t val. p
(Intercept) -3601.28 1204.57 -2.99 0.00
agriland -0.00 0.00 -6.00 0.00
agriexp 2131.63 200.16 10.65 0.00
foodexp 244.89 29.27 8.37 0.00
pop 676.31 348.33 1.94 0.05
soy -0.00 0.00 -2.17 0.03
opentr -5.71 10.32 -0.55 0.58
gas -5352.87 371.48 -14.41 0.00
petrol 1579.68 212.49 7.43 0.00
nuc -749.05 291.11 -2.57 0.01
rew 3183.05 275.20 11.57 0.00
Standard errors: OLS
#>    Variables Tolerance  VIF
#> 1   agriland     0.282 3.54
#> 2    agriexp     0.626 1.60
#> 3    foodexp     0.508 1.97
#> 4        pop     0.613 1.63
#> 5        soy     0.367 2.72
#> 6     opentr     0.483 2.07
#> 7        gas     0.213 4.70
#> 8     petrol     0.206 4.85
#> 9        nuc     0.669 1.50
#> 10       rew     0.234 4.28

Removing the coal variable had a significant impact on the R-squared. But now all our VIF indicators are below five. We could have expect a correlation between “agriexp” and “foodexp” but the VIF of both is below 5, so we can keep both of them for the moment.

However, the “opentr” coefficient isn’t significant (large p-value) we will then try to run the regression without it:

Fifth model

Observations 461 (980 missing obs. deleted)
Dependent variable deforest
Type OLS linear regression
F(9,451) 58.04
0.54
Adj. R² 0.53
Est. S.E. t val. p
(Intercept) -4185.57 578.43 -7.24 0.00
agriland -0.00 0.00 -6.01 0.00
agriexp 2151.33 196.81 10.93 0.00
foodexp 248.82 28.38 8.77 0.00
pop 777.35 296.36 2.62 0.01
soy -0.00 0.00 -2.17 0.03
gas -5379.95 367.96 -14.62 0.00
petrol 1585.33 212.08 7.48 0.00
nuc -684.46 266.45 -2.57 0.01
rew 3189.29 274.76 11.61 0.00
Standard errors: OLS
#>   Variables Tolerance  VIF
#> 1  agriland     0.293 3.42
#> 2   agriexp     0.647 1.55
#> 3   foodexp     0.539 1.85
#> 4       pop     0.845 1.18
#> 5       soy     0.367 2.72
#> 6       gas     0.217 4.62
#> 7    petrol     0.207 4.84
#> 8       nuc     0.797 1.25
#> 9       rew     0.234 4.27

Removing the openness to trade didn’t lower the R-squared and the coefficients are all significant at 95% which is great.

Nevertheless, we notice that even after dealing with the multicollinearity, we still have four out of nine of our coefficients are negative (of wrong sign).

We can try to run the AIC criteria in order to see whether it identifies other variables to delete.

#> Start:  AIC=7717
#> deforest ~ agriland + agriexp + foodexp + pop + soy + opentr + 
#>     prod + coal + gas + petrol + nuclrew + nuc + rew
#> 
#> 
#> Step:  AIC=7717
#> deforest ~ agriland + agriexp + foodexp + pop + soy + opentr + 
#>     prod + coal + gas + petrol + nuclrew + nuc
#> 
#> 
#> Step:  AIC=7717
#> deforest ~ agriland + agriexp + foodexp + pop + soy + opentr + 
#>     prod + coal + gas + petrol + nuc
#> 
#>            Df  Sum of Sq         RSS  AIC
#> <none>                    8148592726 7717
#> - pop       1   38174490  8186767217 7717
#> - opentr    1   92853771  8241446497 7720
#> - agriexp   1  421570400  8570163127 7738
#> - foodexp   1  718650843  8867243570 7754
#> - soy       1  975299419  9123892145 7767
#> - agriland  1 1059159049  9207751775 7771
#> - petrol    1 4370906888 12519499614 7913
#> - nuc       1 5009293991 13157886717 7936
#> - prod      1 8252403012 16400995738 8038
#> - gas       1 8257811005 16406403731 8038
#> - coal      1 8753073376 16901666103 8051
#> 
#> Call:
#> lm(formula = deforest ~ agriland + agriexp + foodexp + pop + 
#>     soy + opentr + prod + coal + gas + petrol + nuc, data = merged)
#> 
#> Coefficients:
#>  (Intercept)      agriland       agriexp       foodexp  
#> -4690.102251      0.003445    843.893616    149.144817  
#>          pop           soy        opentr          prod  
#>   397.677584     -0.000469     18.642477   5423.756412  
#>         coal           gas        petrol           nuc  
#> -6100.775318  -9691.622319  -4869.969108  -6394.032951

We can see that it does not identify other variables to remove. The AIC criteria doesn’t take into account the multicollinearity. Therefore from multiple perspectives we end up chosing the fifth model.

3. Heteroskedasticity

Even if we solved the multicollinearity issue and that most of our variables are significant, we still have to check the distribution of our error term to determine whether our model is good or not.

We will now plot our residuals in order to see whether the fact that we have a panel data impacts or not the independence of our error term.

We clearly see a pattern in the error term indicating that it is not independently identically distributed. Our error term is then heteroskedastic. Our first intuition is to think that it comes from the fact that we are measuring a cross-sectional data through time. But in fact it can come from other sources of heteroskedasticity problem like an omitted variable bias.

To try to solve this problem we are going to use a panel data model.

4. Panel Data Model

Now we can further improve our model by taking into account the fact that our variables are observed among different countries over a period of time. Having observations taken over a time period implies a correlated error term among the different observations because of other unobserved factors that are link to the time period, and thus bias our estimators. As we have many variable in the same case we have a strong incentive to use a panel data.

There are different method to analyse panel data. The “pooling” method will basically give the same results as our previous OLS model and we found it to have problem with heteroskedasticity. We need a model that assumes that not all countries have the same intercept. In our case we are not sure whether to use the “random” or the “fixed effect” model for our panel data. Therefore, we will first run this two regression without any transformation and taking back the model with almost all the variables (third model).

Comparison fixed vs random effect

#> Oneway (individual) effect Within Model
#> 
#> Call:
#> plm(formula = deforest ~ agriland + agriexp + foodexp + pop + 
#>     soy + opentr + coal + gas + petrol + nuc + rew, data = merged, 
#>     model = "within", index = c("Country", "Year"))
#> 
#> Unbalanced Panel: n = 19, T = 5-27, N = 461
#> 
#> Residuals:
#>     Min.  1st Qu.   Median  3rd Qu.     Max. 
#> -10532.1   -259.8    -24.2    191.6  12925.1 
#> 
#> Coefficients:
#>              Estimate   Std. Error t-value             Pr(>|t|)    
#> agriland    0.0239215    0.0033714    7.10      0.0000000000053 ***
#> agriexp   -41.9811343   84.6503398   -0.50                0.620    
#> foodexp    -8.3653365   20.6163293   -0.41                0.685    
#> pop      -207.7198014  221.0946618   -0.94                0.348    
#> soy        -0.0007609    0.0000305  -24.98 < 0.0000000000000002 ***
#> opentr      9.9071633    5.0202971    1.97                0.049 *  
#> coal       35.0939705   26.5753008    1.32                0.187    
#> gas      -185.3113614  306.5952980   -0.60                0.546    
#> petrol    391.3110210  157.0977542    2.49                0.013 *  
#> nuc       102.6624855  218.2386171    0.47                0.638    
#> rew        31.6666259  144.8718728    0.22                0.827    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Total Sum of Squares:    3070000000
#> Residual Sum of Squares: 855000000
#> R-Squared:      0.722
#> Adj. R-Squared: 0.703
#> F-statistic: 101.612 on 11 and 431 DF, p-value: <0.0000000000000002
#> Oneway (individual) effect Random Effect Model 
#>    (Swamy-Arora's transformation)
#> 
#> Call:
#> plm(formula = deforest ~ agriland + agriexp + foodexp + pop + 
#>     soy + opentr + coal + gas + petrol + nuc + rew, data = merged, 
#>     model = "random", index = c("Country", "Year"))
#> 
#> Unbalanced Panel: n = 19, T = 5-27, N = 461
#> 
#> Effects:
#>                   var std.dev share
#> idiosyncratic 1984360    1409  0.43
#> individual    2641720    1625  0.57
#> theta:
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   0.639   0.826   0.832   0.828   0.835   0.835 
#> 
#> Residuals:
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   -6464    -424    -163      32      64   17100 
#> 
#> Coefficients:
#>                 Estimate   Std. Error z-value             Pr(>|z|)
#> (Intercept) -248.9251724 1082.4344690   -0.23              0.81812
#> agriland       0.0028299    0.0005288    5.35          0.000000087
#> agriexp      -67.2336168  114.9589100   -0.58              0.55865
#> foodexp       49.0166559   26.9428427    1.82              0.06887
#> pop          252.3703544  284.8321410    0.89              0.37560
#> soy           -0.0006119    0.0000354  -17.27 < 0.0000000000000002
#> opentr         1.5762093    6.9863602    0.23              0.82150
#> coal         -62.2733305   34.0530822   -1.83              0.06744
#> gas         -242.6849995  335.2209387   -0.72              0.46909
#> petrol      -119.6844679  186.0710079   -0.64              0.52008
#> nuc         -132.5863143  285.0946361   -0.47              0.64189
#> rew          742.6812194  198.5767435    3.74              0.00018
#>                
#> (Intercept)    
#> agriland    ***
#> agriexp        
#> foodexp     .  
#> pop            
#> soy         ***
#> opentr         
#> coal        .  
#> gas            
#> petrol         
#> nuc            
#> rew         ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Total Sum of Squares:    3760000000
#> Residual Sum of Squares: 1930000000
#> R-Squared:      0.487
#> Adj. R-Squared: 0.475
#> Chisq: 427.819 on 11 DF, p-value: <0.0000000000000002

We can see that agriland, soy are significant in both regression. petrol and opentr are only significant in the fixed effect regression and for the random effect regression rew, coal and foodexp also show to be significant. Both have a similar R-squared.

Just to select which type of model we should go for, we removed all variables that were not significant and a potential source of multicolinerarity. We included the “prod” variable as it includes all the extraction variable. Opentr becomes insignificant when we remove other variable and food exp as well. This model is therefore good to compare both methods. We re-run both regression:

#> Oneway (individual) effect Within Model
#> 
#> Call:
#> plm(formula = deforest ~ agriland + soy + prod, data = merged, 
#>     model = "within", index = c("Country", "Year"))
#> 
#> Unbalanced Panel: n = 68, T = 1-27, N = 1441
#> 
#> Residuals:
#>      Min.   1st Qu.    Median   3rd Qu.      Max. 
#> -10975.96    -28.96     -1.95     25.44  12241.58 
#> 
#> Coefficients:
#>            Estimate Std. Error t-value             Pr(>|t|)    
#> agriland  0.0013648  0.0005980    2.28                0.023 *  
#> soy      -0.0006335  0.0000189  -33.53 < 0.0000000000000002 ***
#> prod     39.1171978  8.5938980    4.55            0.0000058 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Total Sum of Squares:    4430000000
#> Residual Sum of Squares: 2420000000
#> R-Squared:      0.454
#> Adj. R-Squared: 0.426
#> F-statistic: 380.242 on 3 and 1370 DF, p-value: <0.0000000000000002
#> Oneway (individual) effect Random Effect Model 
#>    (Swamy-Arora's transformation)
#> 
#> Call:
#> plm(formula = deforest ~ agriland + soy + prod, data = merged, 
#>     model = "random", index = c("Country", "Year"))
#> 
#> Unbalanced Panel: n = 68, T = 1-27, N = 1441
#> 
#> Effects:
#>                   var std.dev share
#> idiosyncratic 1765646    1329  0.25
#> individual    5222649    2285  0.75
#> theta:
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   0.497   0.877   0.887   0.878   0.889   0.889 
#> 
#> Residuals:
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   -8183    -145     -83       5     -28   16043 
#> 
#> Coefficients:
#>                Estimate  Std. Error z-value             Pr(>|z|)    
#> (Intercept) 588.2809473 328.4888581    1.79              0.07331 .  
#> agriland      0.0016975   0.0003059    5.55          0.000000029 ***
#> soy          -0.0005953   0.0000201  -29.57 < 0.0000000000000002 ***
#> prod         30.5555231   9.0909384    3.36              0.00078 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Total Sum of Squares:    4790000000
#> Residual Sum of Squares: 2960000000
#> R-Squared:      0.383
#> Adj. R-Squared: 0.382
#> Chisq: 892.441 on 3 DF, p-value: <0.0000000000000002

By the random effect model you assume that the intercept are different across countries but that this difference is random and cannot be estimate. In the opposite, the fixed effect model assumes that countries don’t have the same intercept but that this difference can be estimated. In order to know which model we should use between these two we will use the Hausman test:

#> 
#>  Hausman Test
#> 
#> data:  deforest ~ agriland + soy + prod
#> chisq = 33, df = 3, p-value = 0.0000003
#> alternative hypothesis: one model is inconsistent

In fact we can see that the p-value of the Hausman test is very significant. Implying that it is better to use the fixed effect model (model = “within”) and that the random effect model is inconsistent.

#> 
#>  Lagrange Multiplier Test - time effects (Breusch-Pagan) for
#>  unbalanced panels
#> 
#> data:  deforest ~ agriland + agriexp + foodexp + pop + soy + opentr +  ...
#> chisq = 16, df = 1, p-value = 0.00006
#> alternative hypothesis: significant effects

Moreover, by running a Lagrange Multiplier Breusch-Pegan test of our fixed effect panel model, we’ll confirm our intuition that countries behave differently in time (as we cannot reject the null hypothesis stating that countries behave differently) to see whether or not we are right to use the panel data model instead of the OLS. We then solved the problem of heteroskedasticity that was in our previous OLS model.

Now we can look deeper on which extraction factor has the most influence on deforestation. We end up selecting the following model as it is the only significant coefficients that we can get.

Panel fixed effect model

#> Oneway (individual) effect Within Model
#> 
#> Call:
#> plm(formula = deforest ~ agriland + soy + petrol + rew, data = merged, 
#>     model = "within", index = c("Country", "Year"))
#> 
#> Unbalanced Panel: n = 68, T = 1-27, N = 1441
#> 
#> Residuals:
#>      Min.   1st Qu.    Median   3rd Qu.      Max. 
#> -11265.11    -43.81     -1.14     31.89  12174.87 
#> 
#> Coefficients:
#>            Estimate Std. Error t-value             Pr(>|t|)    
#> agriland   0.001262   0.000594    2.13                0.034 *  
#> soy       -0.000679   0.000021  -32.37 < 0.0000000000000002 ***
#> petrol   506.127207 106.296050    4.76            0.0000021 ***
#> rew      177.179701  63.047229    2.81                0.005 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Total Sum of Squares:    4430000000
#> Residual Sum of Squares: 2380000000
#> R-Squared:      0.462
#> Adj. R-Squared: 0.434
#> F-statistic: 294.22 on 4 and 1369 DF, p-value: <0.0000000000000002

We then have different intercept according to the country. The following table shows the different intercept corresponding to its country:

x
Albania -31.90
Angola 2922.86
Argentina 4908.84
Australia -4325.83
Austria -113.23
Azerbaijan -755.41
Bangladesh -82.88
Belize 101.90
Bhutan -14.69
Bosnia and Herzegovina -29.97
Brazil 37366.12
Bulgaria -73.05
Burkina Faso 361.97
Burundi -0.97
Cambodia 1312.37
Cameroon 459.58
Canada -3322.76
China -6894.28
Colombia 561.04
Croatia -28.51
Ecuador 111.41
El Salvador 23.51
Ethiopia 302.05
France -506.15
Georgia -46.90
Germany -436.74
Greece -117.80
Guatemala 507.30
Honduras 164.12
Hungary -91.10
India -1593.02
Indonesia 7184.55
Italy -258.43
Japan -19.90
Kazakhstan -3977.91
Madagascar -67.42
Malawi 390.06
Mali -503.10
Mexico -3092.60
Morocco -386.88
Nepal 76.56
Nicaragua 933.36
Nigeria -1509.80
North Macedonia -17.28
Pakistan -161.72
Panama 90.16
Paraguay 3827.95
Peru 911.82
Philippines 135.07
Poland -229.00
Romania -294.74
Rwanda 2.05
Serbia -3.98
Slovenia -11.30
South Africa -785.29
Spain -467.89
Sri Lanka 34.59
Suriname 42.68
Switzerland -82.44
Tajikistan -83.02
Thailand -328.03
Togo -18.34
Turkey -616.51
Uganda 276.83
Ukraine -290.91
Uruguay 65.28
Zambia 997.16
Zimbabwe 266.02

We don’t forget that those effects are bias from the fact that deforest cannot isolate only the effect of deforestation.

5. Panel with a Balanced data set

We will now try to run our regression with the Balanced data set. This data set differs from merged in the sense that it has the complete set of observations for all the years of the interval. Removing then all the “incomplete” countries.

In order to do so we need first to create a balanced data set from our “merged” data set.
Country Year agriland chagri chagrip agriexp foodexp forest deforest deforestp land pop soy opentr prod coal gas petrol nuclrew nuc rew
Albania 1996 11310 40 0.354 9.03 11.09 7771 19.5 -0.003 27400 -0.622 131 48.7 0.083 0.001 0.001 0.022 0.059 NA 0.059
Albania 1997 11350 40 0.352 14.06 11.11 7752 19.5 -0.003 27400 -0.625 76 50.0 0.073 0.000 0.001 0.021 0.051 NA 0.051
Albania 1998 11390 60 0.527 8.76 9.82 7732 19.5 -0.003 27400 -0.629 49 52.6 0.066 0.000 0.001 0.014 0.050 NA 0.050
Albania 1999 11450 -10 -0.087 4.62 5.53 7712 19.5 -0.003 27400 -0.633 523 55.5 0.068 0.000 0.001 0.013 0.053 NA 0.053
Albania 2000 11440 -50 -0.437 5.97 6.63 7693 0.0 0.002 27400 -0.637 292 63.5 0.061 0.000 0.001 0.013 0.046 NA 0.046
Albania 2001 11390 10 0.088 5.54 5.79 7706 0.0 0.002 27400 -0.938 389 66.5 0.051 0.000 0.001 0.014 0.036 NA 0.036
Albania 2002 11400 -190 -1.667 6.52 3.56 7719 0.0 0.002 27400 -0.300 200 68.5 0.051 0.000 0.001 0.015 0.035 NA 0.035

Now that we have it we’ll try to run the fixed effect regression using it.

#> Oneway (individual) effect Within Model
#> 
#> Call:
#> plm(formula = deforest ~ agriland + soy + prod, data = BalancedModel, 
#>     model = "within", index = c("Country", "Year"))
#> 
#> Balanced Panel: n = 40, T = 20, N = 800
#> 
#> Residuals:
#>      Min.   1st Qu.    Median   3rd Qu.      Max. 
#> -11487.74    -69.07     -3.03     41.11  11685.80 
#> 
#> Coefficients:
#>            Estimate Std. Error t-value            Pr(>|t|)    
#> agriland  0.0023185  0.0008775    2.64              0.0084 ** 
#> soy      -0.0006760  0.0000265  -25.51 <0.0000000000000002 ***
#> prod     45.9302896 11.7744632    3.90              0.0001 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Total Sum of Squares:    3100000000
#> Residual Sum of Squares: 1660000000
#> R-Squared:      0.464
#> Adj. R-Squared: 0.434
#> F-statistic: 218.584 on 3 and 757 DF, p-value: <0.0000000000000002

And by looking deeper into the extraction variable, we end up with this model:

Balanced model

#> Oneway (individual) effect Within Model
#> 
#> Call:
#> plm(formula = deforest ~ agriland + soy + gas + petrol, data = BalancedModel, 
#>     model = "within", index = c("Country", "Year"))
#> 
#> Balanced Panel: n = 40, T = 20, N = 800
#> 
#> Residuals:
#>      Min.   1st Qu.    Median   3rd Qu.      Max. 
#> -11570.50    -95.35     -1.65     57.22  11700.20 
#> 
#> Coefficients:
#>             Estimate  Std. Error t-value             Pr(>|t|)    
#> agriland   0.0027167   0.0008874    3.06              0.00228 ** 
#> soy       -0.0007024   0.0000286  -24.59 < 0.0000000000000002 ***
#> gas      747.1629343 206.3489238    3.62              0.00031 ***
#> petrol   341.7205498 162.6322735    2.10              0.03596 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Total Sum of Squares:    3100000000
#> Residual Sum of Squares: 1650000000
#> R-Squared:      0.468
#> Adj. R-Squared: 0.438
#> F-statistic: 166.502 on 4 and 756 DF, p-value: <0.0000000000000002

The result of the regression didn’t change much from the unbalanced data set. Nevertheless if we are comparing the countries fixed effect, it’s better to have a balanced data set. As it compares the same year for all countries:

x
Albania -35.55
Argentina 3244.63
Australia -11252.45
Austria -120.56
Belize 92.15
Brazil 36140.11
Burundi -34.32
Canada -7103.84
China -13603.92
Colombia -64.75
Croatia -74.66
Ecuador 121.65
El Salvador 7.85
France -812.22
Georgia -74.29
Greece -217.06
Hungary -234.77
India -4458.14
Indonesia 2742.66
Italy -665.74
Japan 69.90
Kazakhstan -7024.33
Madagascar -687.97
Mexico -4835.41
Morocco -825.79
Nicaragua 859.16
Pakistan -1342.46
Panama 64.00
Paraguay 3882.10
Peru 520.35
Philippines -54.57
Romania -763.12
Slovenia -13.92
South Africa -2233.96
Suriname 45.99
Switzerland -39.61
Thailand -1299.88
Turkey -1109.65
Ukraine -1411.42
Uruguay -124.45

6. OLS per year

Another analysis that we can perform in order to compare our results is to apply OLS for one year at a time in order to avoid the heteroskedasticity issue. Here we are going to do this every five years from 1995 until 2015. We will have to make a variable selection for every year.

For this purpose we will use the AIC criteria.

2000

#> Start:  AIC=850
#> deforest ~ agriland + agriexp + foodexp + pop + soy + opentr + 
#>     coal + gas + petrol + rew
#> 
#>            Df Sum of Sq       RSS AIC
#> - foodexp   1    170968 244478503 848
#> - agriexp   1   1107666 245415201 848
#> - pop       1   1293472 245601007 848
#> - opentr    1   4737654 249045189 849
#> - petrol    1   7269416 251576951 849
#> <none>                  244307535 850
#> - agriland  1  84948368 329255903 864
#> - soy       1 111680299 355987834 868
#> - gas       1 132606879 376914414 871
#> - rew       1 169860810 414168345 876
#> - coal      1 374209214 618516749 898
#> 
#> Step:  AIC=848
#> deforest ~ agriland + agriexp + pop + soy + opentr + coal + gas + 
#>     petrol + rew
#> 
#>            Df Sum of Sq       RSS AIC
#> - agriexp   1   1008187 245486690 846
#> - pop       1   2063092 246541596 846
#> - opentr    1   4691041 249169544 847
#> - petrol    1   7124184 251602687 847
#> <none>                  244478503 848
#> - agriland  1  84786811 329265314 862
#> - soy       1 118978000 363456503 867
#> - gas       1 133252625 377731129 869
#> - rew       1 172594396 417072899 874
#> - coal      1 374121726 618600229 896
#> 
#> Step:  AIC=846
#> deforest ~ agriland + pop + soy + opentr + coal + gas + petrol + 
#>     rew
#> 
#>            Df Sum of Sq       RSS AIC
#> - pop       1   3550916 249037606 845
#> - opentr    1   4096307 249582997 845
#> - petrol    1   6467281 251953971 845
#> <none>                  245486690 846
#> - agriland  1  84745434 330232124 860
#> - soy       1 118400506 363887196 865
#> - gas       1 132281474 377768164 867
#> - rew       1 172498198 417984888 873
#> - coal      1 373184642 618671332 894
#> 
#> Step:  AIC=845
#> deforest ~ agriland + soy + opentr + coal + gas + petrol + rew
#> 
#>            Df Sum of Sq       RSS AIC
#> - opentr    1   3564493 252602099 843
#> - petrol    1   7894332 256931938 844
#> <none>                  249037606 845
#> - agriland  1  85300173 334337779 858
#> - soy       1 119878890 368916496 864
#> - gas       1 134644663 383682269 866
#> - rew       1 169742187 418779793 871
#> - coal      1 384359880 633397486 893
#> 
#> Step:  AIC=843
#> deforest ~ agriland + soy + coal + gas + petrol + rew
#> 
#>            Df Sum of Sq       RSS AIC
#> - petrol    1   7124165 259726264 843
#> <none>                  252602099 843
#> - agriland  1  81895163 334497262 857
#> - soy       1 116841426 369443525 862
#> - gas       1 132012614 384614713 864
#> - rew       1 167981563 420583662 869
#> - coal      1 381962883 634564983 891
#> 
#> Step:  AIC=843
#> deforest ~ agriland + soy + coal + gas + rew
#> 
#>            Df Sum of Sq       RSS AIC
#> <none>                  259726264 843
#> - agriland  1  96533561 356259825 858
#> - soy       1 130453075 390179340 863
#> - gas       1 136903263 396629527 864
#> - rew       1 165767561 425493825 868
#> - coal      1 375356522 635082786 889
#> 
#> Call:
#> lm(formula = deforest ~ agriland + soy + coal + gas + rew, data = merged2000)
#> 
#> Coefficients:
#> (Intercept)     agriland          soy         coal          gas  
#>  -290.62442      0.00241      0.00261  -1009.08325  -2337.90840  
#>         rew  
#>  4432.29982
Observations 54
Dependent variable deforest
Type OLS linear regression
F(5,48) 46.95
0.83
Adj. R² 0.81
Est. S.E. t val. p
(Intercept) -290.62 376.26 -0.77 0.44
agriland 0.00 0.00 4.22 0.00
soy 0.00 0.00 4.91 0.00
coal -1009.08 121.16 -8.33 0.00
gas -2337.91 464.79 -5.03 0.00
rew 4432.30 800.79 5.53 0.00
Standard errors: OLS

2005

#> Start:  AIC=896
#> deforest ~ agriland + agriexp + foodexp + pop + soy + opentr + 
#>     coal + gas + petrol + rew
#> 
#>            Df Sum of Sq       RSS AIC
#> - pop       1    204871 201671380 894
#> - foodexp   1    531742 201998251 894
#> - agriexp   1   2375283 203841793 894
#> - opentr    1   3711903 205178412 895
#> <none>                  201466510 896
#> - petrol    1   7246550 208713060 896
#> - agriland  1  53814648 255281158 907
#> - soy       1 106121138 307587648 918
#> - gas       1 130675199 332141708 923
#> - rew       1 145487066 346953575 925
#> - coal      1 345941051 547407560 951
#> 
#> Step:  AIC=894
#> deforest ~ agriland + agriexp + foodexp + soy + opentr + coal + 
#>     gas + petrol + rew
#> 
#>            Df Sum of Sq       RSS AIC
#> - foodexp   1   1087474 202758855 892
#> - agriexp   1   3200144 204871524 892
#> - opentr    1   3513338 205184718 893
#> <none>                  201671380 894
#> - petrol    1   7567944 209239324 894
#> - agriland  1  55377392 257048772 906
#> - soy       1 106880750 308552130 916
#> - gas       1 132038171 333709551 921
#> - rew       1 145747642 347419023 923
#> - coal      1 351183065 552854446 950
#> 
#> Step:  AIC=892
#> deforest ~ agriland + agriexp + soy + opentr + coal + gas + petrol + 
#>     rew
#> 
#>            Df Sum of Sq       RSS AIC
#> - agriexp   1   3169308 205928162 891
#> - opentr    1   3581461 206340316 891
#> <none>                  202758855 892
#> - petrol    1   7225043 209983897 892
#> - agriland  1  54417987 257176842 904
#> - soy       1 117488806 320247661 916
#> - gas       1 131249171 334008026 919
#> - rew       1 145433456 348192311 921
#> - coal      1 350250666 553009520 948
#> 
#> Step:  AIC=891
#> deforest ~ agriland + soy + opentr + coal + gas + petrol + rew
#> 
#>            Df Sum of Sq       RSS AIC
#> - opentr    1   2167937 208096099 889
#> - petrol    1   6999698 212927861 891
#> <none>                  205928162 891
#> - agriland  1  52829459 258757622 902
#> - soy       1 117533712 323461875 915
#> - gas       1 130700665 336628828 917
#> - rew       1 143569574 349497736 919
#> - coal      1 347720074 553648236 946
#> 
#> Step:  AIC=889
#> deforest ~ agriland + soy + coal + gas + petrol + rew
#> 
#>            Df Sum of Sq       RSS AIC
#> <none>                  208096099 889
#> - petrol    1   7332843 215428942 889
#> - agriland  1  50712673 258808773 900
#> - soy       1 117250862 325346961 913
#> - gas       1 129781791 337877890 916
#> - rew       1 141402976 349499076 917
#> - coal      1 350820672 558916771 945
#> 
#> Call:
#> lm(formula = deforest ~ agriland + soy + coal + gas + petrol + 
#>     rew, data = merged2005)
#> 
#> Coefficients:
#> (Intercept)     agriland          soy         coal          gas  
#>  -253.37304      0.00190      0.00143   -623.76018  -2505.59588  
#>      petrol          rew  
#>   376.59273   3974.85188
Observations 58
Dependent variable deforest
Type OLS linear regression
F(10,47) 31.12
0.87
Adj. R² 0.84
Est. S.E. t val. p
(Intercept) -1287.79 973.02 -1.32 0.19
agriland 0.00 0.00 3.54 0.00
agriexp 21.63 29.06 0.74 0.46
foodexp 4.82 13.70 0.35 0.73
pop 70.11 320.68 0.22 0.83
soy 0.00 0.00 4.98 0.00
opentr 9.14 9.82 0.93 0.36
coal -636.60 70.86 -8.98 0.00
gas -2524.16 457.17 -5.52 0.00
petrol 377.50 290.34 1.30 0.20
rew 4119.52 707.11 5.83 0.00
Standard errors: OLS
Observations 58
Dependent variable deforest
Type OLS linear regression
F(5,52) 63.73
0.86
Adj. R² 0.85
Est. S.E. t val. p
(Intercept) -252.60 314.65 -0.80 0.43
agriland 0.00 0.00 3.89 0.00
soy 0.00 0.00 5.44 0.00
coal -609.80 66.97 -9.11 0.00
gas -2217.94 391.99 -5.66 0.00
rew 4093.25 674.52 6.07 0.00
Standard errors: OLS

2010

Observations 62
Dependent variable deforest
Type OLS linear regression
F(10,51) 7.31
0.59
Adj. R² 0.51
Est. S.E. t val. p
(Intercept) -304.13 830.82 -0.37 0.72
agriland -0.00 0.00 -0.70 0.49
agriexp 34.66 73.73 0.47 0.64
foodexp -12.76 11.96 -1.07 0.29
pop 431.02 242.90 1.77 0.08
soy 0.00 0.00 4.59 0.00
opentr 5.72 8.08 0.71 0.48
coal -67.99 53.41 -1.27 0.21
gas 60.60 360.96 0.17 0.87
petrol 254.19 226.21 1.12 0.27
rew -88.82 484.92 -0.18 0.86
Standard errors: OLS
#> Start:  AIC=932
#> deforest ~ agriland + agriexp + foodexp + pop + soy + opentr + 
#>     coal + gas + petrol + rew
#> 
#>            Df Sum of Sq       RSS AIC
#> - gas       1     80976 146621183 930
#> - rew       1     96397 146636604 930
#> - agriexp   1    634713 147174919 930
#> - agriland  1   1414602 147954808 930
#> - opentr    1   1440352 147980559 930
#> - foodexp   1   3272957 149813164 931
#> - petrol    1   3628044 150168251 931
#> - coal      1   4656114 151196320 932
#> <none>                  146540206 932
#> - pop       1   9047316 155587522 934
#> - soy       1  60575748 207115954 951
#> 
#> Step:  AIC=930
#> deforest ~ agriland + agriexp + foodexp + pop + soy + opentr + 
#>     coal + petrol + rew
#> 
#>            Df Sum of Sq       RSS AIC
#> - rew       1     56354 146677537 928
#> - agriexp   1    731050 147352232 928
#> - agriland  1   1335612 147956795 928
#> - opentr    1   1393059 148014242 929
#> - foodexp   1   3193763 149814946 929
#> <none>                  146621183 930
#> - coal      1   5324979 151946162 930
#> - petrol    1   5708104 152329287 930
#> - pop       1   9129825 155751008 932
#> - soy       1  68875025 215496208 952
#> 
#> Step:  AIC=928
#> deforest ~ agriland + agriexp + foodexp + pop + soy + opentr + 
#>     coal + petrol
#> 
#>            Df Sum of Sq       RSS AIC
#> - agriexp   1    727189 147404726 926
#> - agriland  1   1297236 147974773 926
#> - opentr    1   1511324 148188861 927
#> - foodexp   1   3146700 149824237 927
#> <none>                  146677537 928
#> - petrol    1   6712240 153389777 929
#> - pop       1   9679070 156356607 930
#> - coal      1  11661642 158339179 931
#> - soy       1 119135427 265812964 963
#> 
#> Step:  AIC=926
#> deforest ~ agriland + foodexp + pop + soy + opentr + coal + petrol
#> 
#>            Df Sum of Sq       RSS AIC
#> - opentr    1   1117455 148522181 925
#> - agriland  1   1607845 149012571 925
#> - foodexp   1   3203654 150608380 926
#> <none>                  147404726 926
#> - petrol    1   6246192 153650918 927
#> - coal      1  11197501 158602227 929
#> - pop       1  11746323 159151049 929
#> - soy       1 121420241 268824967 962
#> 
#> Step:  AIC=925
#> deforest ~ agriland + foodexp + pop + soy + coal + petrol
#> 
#>            Df Sum of Sq       RSS AIC
#> - agriland  1   2191152 150713333 924
#> - foodexp   1   2947514 151469695 924
#> <none>                  148522181 925
#> - petrol    1   6340946 154863127 925
#> - coal      1  10575501 159097682 927
#> - pop       1  10734384 159256565 927
#> - soy       1 120737060 269259241 960
#> 
#> Step:  AIC=924
#> deforest ~ foodexp + pop + soy + coal + petrol
#> 
#>           Df Sum of Sq       RSS AIC
#> - foodexp  1   2211706 152925038 923
#> <none>                 150713333 924
#> - petrol   1   5029659 155742992 924
#> - pop      1   9261200 159974532 925
#> - coal     1  32852682 183566015 934
#> - soy      1 134955558 285668890 961
#> 
#> Step:  AIC=923
#> deforest ~ pop + soy + coal + petrol
#> 
#>          Df Sum of Sq       RSS AIC
#> <none>                152925038 923
#> - pop     1   7131965 160057003 923
#> - petrol  1   8784057 161709095 924
#> - coal    1  33359387 186284425 933
#> - soy     1 136949423 289874462 960
#> 
#> Call:
#> lm(formula = deforest ~ pop + soy + coal + petrol, data = merged2010)
#> 
#> Coefficients:
#> (Intercept)          pop          soy         coal       petrol  
#>    4.957774   311.676234     0.000445   -93.275409   269.155894
Observations 62
Dependent variable deforest
Type OLS linear regression
F(4,57) 17.64
0.55
Adj. R² 0.52
Est. S.E. t val. p
(Intercept) 385.18 249.44 1.54 0.13
agriland -0.00 0.00 -0.49 0.63
soy 0.00 0.00 6.42 0.00
petrol 344.86 156.92 2.20 0.03
coal -88.58 35.35 -2.51 0.02
Standard errors: OLS

2015

Observations 60
Dependent variable deforest
Type OLS linear regression
F(10,49) 29.26
0.86
Adj. R² 0.83
Est. S.E. t val. p
(Intercept) -246.52 545.20 -0.45 0.65
agriland 0.00 0.00 0.24 0.81
agriexp 29.90 43.16 0.69 0.49
foodexp -15.57 7.54 -2.07 0.04
pop 426.21 145.17 2.94 0.01
soy 0.00 0.00 7.80 0.00
opentr 4.47 4.89 0.91 0.37
coal -72.92 43.16 -1.69 0.10
gas -800.40 232.04 -3.45 0.00
petrol 442.36 141.22 3.13 0.00
rew 112.26 262.41 0.43 0.67
Standard errors: OLS
#> Start:  AIC=840
#> deforest ~ agriland + agriexp + foodexp + pop + soy + opentr + 
#>     coal + gas + petrol + rew
#> 
#>            Df Sum of Sq       RSS AIC
#> - agriland  1     56677  50129814 838
#> - rew       1    187044  50260181 838
#> - agriexp   1    490370  50563506 839
#> - opentr    1    852265  50925402 839
#> <none>                   50073137 840
#> - coal      1   2916640  52989776 841
#> - foodexp   1   4362420  54435556 843
#> - pop       1   8808754  58881890 848
#> - petrol    1  10027228  60100365 849
#> - gas       1  12159401  62232537 851
#> - soy       1  62211731 112284868 887
#> 
#> Step:  AIC=838
#> deforest ~ agriexp + foodexp + pop + soy + opentr + coal + gas + 
#>     petrol + rew
#> 
#>           Df Sum of Sq       RSS AIC
#> - rew      1    131813  50261627 836
#> - agriexp  1    447683  50577497 837
#> - opentr   1    795652  50925465 837
#> <none>                  50129814 838
#> - foodexp  1   4714593  54844406 842
#> - coal     1   4799492  54929306 842
#> - pop      1   8898482  59028296 846
#> - petrol   1  10471282  60601095 848
#> - gas      1  12522224  62652037 850
#> - soy      1 106792262 156922076 905
#> 
#> Step:  AIC=836
#> deforest ~ agriexp + foodexp + pop + soy + opentr + coal + gas + 
#>     petrol
#> 
#>           Df Sum of Sq       RSS AIC
#> - agriexp  1    441975  50703602 835
#> - opentr   1    797652  51059279 835
#> <none>                  50261627 836
#> - foodexp  1   4800978  55062605 840
#> - pop      1   8843081  59104708 844
#> - gas      1  12490097  62751724 848
#> - petrol   1  12512199  62773825 848
#> - coal     1  14550360  64811987 850
#> - soy      1 157428513 207690140 919
#> 
#> Step:  AIC=835
#> deforest ~ foodexp + pop + soy + opentr + coal + gas + petrol
#> 
#>           Df Sum of Sq       RSS AIC
#> - opentr   1    650822  51354424 834
#> <none>                  50703602 835
#> - foodexp  1   4459965  55163566 838
#> - pop      1   8828736  59532337 842
#> - petrol   1  12261481  62965083 846
#> - gas      1  12277841  62981443 846
#> - coal     1  14908037  65611639 848
#> - soy      1 157711177 208414779 918
#> 
#> Step:  AIC=834
#> deforest ~ foodexp + pop + soy + coal + gas + petrol
#> 
#>           Df Sum of Sq       RSS AIC
#> <none>                  51354424 834
#> - foodexp  1   4566200  55920624 837
#> - pop      1   8238621  59593045 841
#> - petrol   1  12386234  63740658 845
#> - gas      1  13524818  64879242 846
#> - coal     1  14947366  66301790 847
#> - soy      1 161346828 212701251 917
#> 
#> Call:
#> lm(formula = deforest ~ foodexp + pop + soy + coal + gas + petrol, 
#>     data = merged2015)
#> 
#> Coefficients:
#> (Intercept)      foodexp          pop          soy         coal  
#>  247.758472   -15.421249   376.988059     0.000461   -56.861524  
#>         gas       petrol  
#> -806.270953   459.842415
Observations 60
Dependent variable deforest
Type OLS linear regression
F(5,54) 51.45
0.83
Adj. R² 0.81
Est. S.E. t val. p
(Intercept) 272.78 163.09 1.67 0.10
agriland 0.00 0.00 0.47 0.64
soy 0.00 0.00 10.29 0.00
coal -65.09 21.15 -3.08 0.00
gas -952.28 232.36 -4.10 0.00
petrol 599.03 129.02 4.64 0.00
Standard errors: OLS

Resulting coefficients

Now that we have the significant coefficient according to the year, we can compare them. This is an important result as it shows only the coefficients that are significant at 95%. By taking the observations only in one year at a time, we avoid the problem of heteroskedasticity that we had before. Taking one model every five years from 2000 to 2015, we applied the AIC criteria to select the variable. Our four models have a good R-squared and have only significant coefficient. We can see that the coefficient doesn’t vary much through years (except for coal and agriland) so that is a good point. The goal would have been to have all of them on the positive side but when we remove some then the model becomes unsignificant. The year 2005 seems to be a relevant one to be compared with the pooled version of the panel data (OLS) and the fixed effect model that we obtained.

7. Comparison of the Results

Now that we performed all our regression, here is a plot of the coefficients from the main four regression that we made. By ploting the coefficient of the three methods used we can see that the models using the fixed effect panel regression are much closer to zero than the OLS ones. They are more realistic as well because their coefficients are positive. We know that the OLS is bias because of its correlated error term so even if it has given a more complete model we have more incentive to consider the panel fixed effect regression to interpret the coefficient and finally answer to our initial research question.

Pooling regression (OLS)
term estimate std.error statistic p.value
(Intercept) -4185.5702 578.4309 -7.24 0.0000
agriland -0.0022 0.0004 -6.01 0.0000
agriexp 2151.3341 196.8128 10.93 0.0000
foodexp 248.8179 28.3780 8.77 0.0000
pop 777.3459 296.3613 2.62 0.0090
soy -0.0002 0.0001 -2.17 0.0308
gas -5379.9487 367.9605 -14.62 0.0000
petrol 1585.3310 212.0827 7.48 0.0000
nuc -684.4593 266.4543 -2.57 0.0105
rew 3189.2867 274.7572 11.61 0.0000
OLS per year
term estimate std.error statistic p.value
(Intercept) -252.5996 314.6505 -0.803 0.4257
agriland 0.0021 0.0005 3.888 0.0003
soy 0.0015 0.0003 5.439 0.0000
coal -609.7985 66.9664 -9.106 0.0000
gas -2217.9449 391.9918 -5.658 0.0000
rew 4093.2531 674.5194 6.068 0.0000
Fixed effects with full sample
term estimate std.error statistic p.value
agriland 0.0013 0.0006 2.13 0.0337
soy -0.0007 0.0000 -32.37 0.0000
petrol 506.1272 106.2960 4.76 0.0000
rew 177.1797 63.0472 2.81 0.0050
Fixed effects with Balanced data
term estimate std.error statistic p.value
agriland 0.0027 0.0009 3.06 0.0023
soy -0.0007 0.0000 -24.59 0.0000
gas 747.1629 206.3489 3.62 0.0003
petrol 341.7205 162.6323 2.10 0.0360

From this we can see the significance level of the different model. For the four regression all our coefficients are significant at 95% (no p-value above 0.05). We can now compare our R-squared.

Model R-squared
OLS model 0.524
OLS2005 0.859
Fixed effect 0.434
Balanced 0.438

The R-squared is only very high in the ols2005 model. We can therefore conclude the effect on deforestation that were statistically significant at this year and their weight. The last two models have an Rsquared that is quite low but realistic to the fact that there might be an omitted variable that would explain the model better.

8. Answers to the research questions

From the Balanced data set we can first capture the effect of agricultural land on deforestation. According to our results, an increase of 1 square kilometer of the agricultural land implies an increase of deforestation of 0.0027 square kilometers under the same conditions. Also if the soy production decreases by one dollar under ceteris paribus the deforestation increases by 0.0007024 sq km and the increase of one quad of gas decreases the deforestation by 747.1629343 sq km. Regarding the model the increase of one quad of petrol leads to an increase of 341.7205498 deforestation in sq km.

We can make an estimate for the amount of deforestation for a country if we know the amount of agricultural land, soy production in thousand dollar, gas production in quad and petrol production in quad. The estimate amount of deforestation in sq km can be calculated by this formula:

\[ Deforestation = agriculturalland * 0.0027167 - soyproduction * 0.0007024 + gas * 747.1629343 + petrol *341.7205498 \] Because the velue of the R-squared is 0.43 with this model we can just explain approximately 43% of observed variation.

Conclusion

To conclude our report we will quickly go through our main steps and results and explain how it could have been improved.

The main goal of our research was to identify which countries were the most impacted by deforestation, how this differs through time and then detect the possible causes of deforestation not only in a specific region but generally around the globe. For this purpose we used different data sets coming from different sources: the FAO, data.worldbank, ourworldindata and the eia. In each of them we had a deep look into what could be the potential causes of deforestation, selecting then our variables from this data. In order to use this variable we tidied all our data sets and merged them in a single one.

Then we had a deeper look at our data. Exploring their distribution, the extremes values and we managed to map this changes worldwide to visualize better the deforestation. Already in this part we discovered which countries are the most affected by deforestation, like Brazil, Angola, Indonesia, Argentina, Peru, Colombia, Nigeria, Paraguay, Colombia, Zambia and Cambodia. Surprising was the high amount of forest area growth in China and also in India, that was also a problem for the regression we planned to make in the analysis. We verified whether the extremes values we found really faced extreme deforestation (or reforestation) to make sure there was no big mistake in the data we were using. We have also seen that there is a significant positive correlation between deforestation and agricultural land, gas production, petrol production, population growth, production of renewables and soy production and a negative significant correlation between nuclear production and openness to trade. We also discovered some correlation between variables we wanted to use for the regression in our analysis, that was also something we had to take into account.

In the analysis, we tried many different methods and end up showing the four main ones. First the OLS regression, we found it important to show as it is the most common one to analyse and without analyzing it we wouldn’t have been sure whether using it or not was good. We were able to verify whether the assumptions were met or not and unfortunately the error term appeared to be heteroskedastic. In order to deal with it, we used a panel data regression. Nevertheless our results were not completely satisfying as even if the unobservable effect of time was taken into consideration, we had very few significant variables. There could be that an omitted variable was included in the error term. Then by using a balanced model of our data set we end up to very similar result but indicating different extraction causes to deforestation. Finally the result that we had by applying OLS per year was very significant but only for one year. the sign of the resulting coefficient was not in line with our expectation.

To conclude we have to say that we chose a difficult topic to analyse. Starting from the research of the data sets to all the steps from the tyding to the analysis, this subject reveals to be very challenging. Other studies have been done on this subject but they were mostly using satellites pictures to analyse the deforestation. Chosing the forest in square kilometers to determine the difference from one year to the other to set our deforestation variable first seemed to be a great idea to us. But we completely underestimated the effect of “reforestation” applied by many countries and making their forest growing from one year to the next. This effect bias our analysis as our dependent variable contained two opposite effect in one. An alternative to it would have been to take only the value of the primary forest. As primary forest cannot be replanted it would have been a good way to separate those two effects. The data set we found on primary forest was very weak compare to the one we finally end up chosing, therefore we cannot certify that it would have given better result.

Resources

Arima E.Y, Richards P., Walker R., Caldas M.C. Statistical confirmation of indirect land use change in the Brazilian Amazon. Environmental Research Letters, 6 (2) (2011), p. 7.

Carr D., Davis J., Jankowska M.M, Grant L., Carr A.C., Clark M.: Space versus place in complex human–natural systems: Spatial and multi-level models of tropical land use and cover change (LUCC) in Guatemala.Ecological Modelling, 229 (2012), pp. 64-75.

Ehrhardt-Martinez K., Crenshaw E., Jenkins J.C.: Deforestation and the environmental Kuznets curve: A cross-national investigation of intervening mechanisms. Social Science Quarterly, 83 (1) (2002), pp. 226-243.

W.R. Faria W.R., Almeida A.N: Relationship between openness to trade and deforestation: Empirical evidence from the Brazilian Amazon. Ecological Modelling, 121 (2016), pp. 85-97

National Geographie (2021): Encyclopedia. Deforestation. [online]: Deforestation | National Geographic Society

Pfaff A., Robalino J., Walker R., Aldrich S., Caldas M., Reis E., et al.:Road inv